ECE410 Design Project Spring 2013 Design and Characterization of a CMOS 8-bit pipelined Microprocessor Data Path

Similar documents
Project Part A: Single Cycle Processor

MIPS%Assembly% E155%

MIPS Instruction Reference

101 Assembly. ENGR 3410 Computer Architecture Mark L. Chang Fall 2009

F. Appendix 6 MIPS Instruction Reference

RTL Model of a Two-Stage MIPS Processor


CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

CS31001 COMPUTER ORGANIZATION AND ARCHITECTURE. Debdeep Mukhopadhyay, CSE, IIT Kharagpur. Instructions and Addressing

Mips Code Examples Peter Rounce

CSc 256 Midterm 2 Fall 2011

Programming the processor

Final Project: MIPS-like Microprocessor

CSc 256 Midterm 2 Spring 2012

MIPS Reference Guide

Computer Architecture Experiment

Midterm. Sticker winners: if you got >= 50 / 67

MIPS Instruction Set

Welcome to CS250 VLSI Systems Design

Processor. Han Wang CS3410, Spring 2012 Computer Science Cornell University. See P&H Chapter , 4.1 4

CPU Design for Computer Integrated Experiment

CENG 3420 Lecture 06: Datapath

Computer Organization

CS222: MIPS Instruction Set

Week 10: Assembly Programming

EECS 151/251A: SPRING 17 MIDTERM 2 SOLUTIONS

Assembly Programming

CS222: Processor Design

ICS 233 COMPUTER ARCHITECTURE. MIPS Processor Design Multicycle Implementation

The MIPS Instruction Set Architecture

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA MIPS ISA. In a CPU. (vonneumann) Processor Organization

Computer Architecture. The Language of the Machine

Laboratory Exercise 6 Pipelined Processors 0.0

CS Computer Architecture Spring Week 10: Chapter

EASTERN MEDITERRANEAN UNIVERSITY COMPUTER ENGINEERING DEPARTMENT

R-type Instructions. Experiment Introduction. 4.2 Instruction Set Architecture Types of Instructions

Instruction Set Principles. (Appendix B)

EE108B Lecture 3. MIPS Assembly Language II

CS3350B Computer Architecture Winter 2015

Materials: 1. Projectable Version of Diagrams 2. MIPS Simulation 3. Code for Lab 5 - part 1 to demonstrate using microprogramming

Question 0. Do not turn this page until you have received the signal to start. (Please fill out the identification section above) Good Luck!

ECE 15B Computer Organization Spring 2011

ECE Exam I February 19 th, :00 pm 4:25pm

MIPS ISA. 1. Data and Address Size 8-, 16-, 32-, 64-bit 2. Which instructions does the processor support

MIPS Instruction Format

ECE232: Hardware Organization and Design. Computer Organization - Previously covered

Lecture Topics. Announcements. Today: The MIPS ISA (P&H ) Next: continued. Milestone #1 (due 1/26) Milestone #2 (due 2/2)

EEM 486: Computer Architecture. Lecture 2. MIPS Instruction Set Architecture

CSc 256 Final Fall 2016

Examples of branch instructions

Systems Architecture

CAD4 The ALU Fall 2009 Assignment. Description

5/17/2012. Recap from Last Time. CSE 2021: Computer Organization. The RISC Philosophy. Levels of Programming. Stored Program Computers

Recap from Last Time. CSE 2021: Computer Organization. Levels of Programming. The RISC Philosophy 5/19/2011

Topic #6. Processor Design

Overview. Introduction to the MIPS ISA. MIPS ISA Overview. Overview (2)

INSTRUCTION SET COMPARISONS

ECE369. Chapter 5 ECE369

CS61CL Machine Structures. Lec 5 Instruction Set Architecture

Flow of Control -- Conditional branch instructions

A Processor. Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. See: P&H Chapter , 4.1-3

Programmable Machines

Materials: 1. Projectable Version of Diagrams 2. MIPS Simulation 3. Code for Lab 5 - part 1 to demonstrate using microprogramming

Exam in Computer Engineering

Programmable Machines

ECE 2300 Digital Logic & Computer Organization. More Single Cycle Microprocessor

Outline. EEL-4713 Computer Architecture Multipliers and shifters. Deriving requirements of ALU. MIPS arithmetic instructions

CPU Organization (Design)

Chapter 4. The Processor

Computer Architecture. MIPS Instruction Set Architecture

M2 Instruction Set Architecture

Instructions: MIPS ISA. Chapter 2 Instructions: Language of the Computer 1

Review of Last lecture. Review ALU Design. Designing a Multiplier Shifter Design Review. Booth s algorithm. Today s Outline

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Introduction to CMOS VLSI Design (E158) Lab 4: Controller Design

Computer Organization. Structure of a Computer. Registers. Register Transfer. Register Files. Memories

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Concocting an Instruction Set

IMPLEMENTATION MICROPROCCESSOR MIPS IN VHDL LAZARIDIS DIMITRIS.

Anne Bracy CS 3410 Computer Science Cornell University. See P&H Chapter: , , Appendix B

Q1: /30 Q2: /25 Q3: /45. Total: /100

Reduced Instruction Set Computer (RISC)

ECE260: Fundamentals of Computer Engineering

UC Berkeley CS61C : Machine Structures

ECE 473 Computer Architecture and Organization Project: Design of a Five Stage Pipelined MIPS-like Processor Project Team TWO Objectives

SIDDHARTH INSTITUTE OF ENGINEERING AND TECHNOLOGY :: PUTTUR (AUTONOMOUS) Siddharth Nagar, Narayanavanam Road QUESTION BANK UNIT I

SPIM Instruction Set

Computer Science and Engineering 331. Midterm Examination #1. Fall Name: Solutions S.S.#:

ECE 2035 Programming HW/SW Systems Fall problems, 7 pages Exam Two 23 October 2013

COMP 303 MIPS Processor Design Project 3: Simple Execution Loop

ECE260: Fundamentals of Computer Engineering

Reduced Instruction Set Computer (RISC)

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture

Computer Science 61C Spring Friedland and Weaver. The MIPS Datapath

Computer Architecture

Block diagram view. Datapath = functional units + registers

COMPSCI 313 S Computer Organization. 7 MIPS Instruction Set

Lecture 4: MIPS Instruction Set

Transcription:

ECE410 Design Project Spring 2013 Design and Characterization of a CMOS 8-bit pipelined Microprocessor Data Path Project Summary This project involves the schematic and layout design of an 8-bit microprocessor data path, including an ALU, barrel shifter, and a register file using CMOS circuitry and Cadence VLSI design tools. The project will incorporate all the skills and knowledge students have gained from the individual lab assignments and provide a team of students with an opportunity to work on a more open-ended integrated circuits design problem. Data Path Overview The data path is the core element of a microprocessor that can perform many functions, including arithmetic and logic evaluation, data movement, and storage of results in a memory address specified by the instruction. In this design project, teams must design a complete 8-bit data path consisting of a register file, an ALU (arithmetic logic unit), shifter, memory unit, and program counter as shown in Figure 1. The data path will be capable of multiple instructions determined by several signals that comprise the operational code or opcode. The data path is pipelined, or broken into stages. Each stage is separated by a register (pipeline registers) containing all the signals required for operation. The pipeline registers provide latching for the signals between clock cycles. Figure 1: Structure of the microprocessor data path [1] [1] Patterson, David A., and John L. Hennessy. Computer organization and design: the hardware/software interface. Morgan Kaufmann, 2008.

The ALU can perform both arithmetic and bit-wise logical operations. The ALU will have two 8-bit data inputs and several control signals that specify the ALU function to be executed in a given clock cycle. The ALU input data is loaded from the pipeline register at the rising edge of the clock, and the output is generated after a delay due to propagation through the combinational logic of the ALU and the shifter. The central component of an ALU is an adder, and for this project a Manchester carry look-ahead adder will be used. The shifter provides data manipulation capabilities. In Lab 6, a simple shift register that can shift by one position per clock cycle was built. In this design project, a barrel shifter will be used because it has a very efficient layout and can perform n-bit shifts in a single clock cycle. A MUX is used to select the correct out from the ALU or the shifter. The output of the MUX goes to the input of the next pipeline register. The register file is a block of memory that provides the data inputs to and stores the ouputs from the ALU. The register file at read/write address is specified by bits included in the opcode. In this project, pipeline registers will be synchronized by a single clock that determines when new data is available to the register file and the ALU/shifter. Data Path The following specifications must be achieved by the data path you design Data width: 8-bits Inputs: 16-bit instructions from Instruction Memory 8-bit data from Data Memory Outputs: 8-bit program counter 8-bit address for Data memory 8-bit data to write to Data memory 1-bit Data memory write signal 1-bit Data memory read signal Power supply: VDD = 3V, GND = 0V Size specifications: No specific layout size is required, but optimization of physical size is a design priority, and higher scores will be awarded to groups with smaller layouts. The floorplan in Figure 2 will assist in achieving an optimal layout. Timing specifications: Singal timing requirements are defined below. The worst case propagation delay for all ALU+shifter functions must be less than 50ns. Instruction Set: All of the following instructions must be implemented The instruction set encoding is as follows:

I-Type instructions oooo osss ttti iiii BLTZ 1000 0sss 000i iiii Branch on Less than Zero BGEZ 1000 1sss 001i iiii Branch on Greater Than or Equal Zero BEQ 1010 0sss ttti iiii Branch on Equal BNE 1010 1sss ttti iiii Branch on Not Equal BLEZ 1011 0sss 000i iiii Branch on Less than or Equal Zero BGTZ 1011 1sss 000i iiii Branch on Greater than Zero ADDI 1100 0sss ttti iiii ADD immediate SLTI 1100 1sss ttti iiii Set on Less than Immediate ANDI 1101 0sss ttti iiii AND immediate ORI 1101 1sss ttti iiii OR immediate XORI 1110 0sss ttti iiii XOR immediate LUI 1110 1--- ttti iiii Load Upper Immediate LW 1111 0sss ttti iiii Load Word SW 1111 1sss ttti iiii Store Word J-Type instructions oooo oiii iiii iiii J 1001 0iii iiii iiii Jump JAL 1001 1iii iiii iiii Jump and Link R-Type instructions oooo osss tttd ddhh SLL 0000 1--- tttd ddhh Shift Left Logical SRL 0001 0--- tttd ddhh Shift Right Logical SRA 0001 1--- tttd ddhh Shift Right Arithmetic SLLV 0010 0sss tttd dd00 Shift Left Logical Variable SRLV 0010 1sss tttd dd00 Shift Right Logical Variable SRAV 0011 0sss tttd dd00 Shift Right Arithmetic Variable JR 0011 1sss 0000 0000 Jump JALR 0100 0sss 000d dd00 Jump and Link NOR 0100 1sss tttd dd00 NOR ADD 0101 0sss tttd dd00 ADD SUB 0101 1sss tttd dd00 SUB

AND 0110 0sss tttd dd00 AND OR 0110 1sss tttd dd00 OR XOR 0111 0sss tttd dd00 XOR SLT 0111 1sss tttd dd00 Set on Less than NOOP 0000 0000 0000 0000 Table 1: ISA encoding s A 3-bit source register t A 3-bit source/destination register d A 3-bit destination register i A 5-bit immediate value, or a 11-bit immediate value h A 2-bit shift amount value Control Signals: 18 internal control signals will be generated from the 16-bit input instruction. They are as follows. NOTE: These are the primary control signals. You may decode and manipulate them as needed as secondary internal control signals. The truth table can be seen in Table 2. o SO Shift unit operation code (2-bits) o V asserted when instruction is a variable shift (1-bit) o MO math unit operation code (3-bits) o I invert second operand (1-bit) o M2 select source for second operand (2-bit) o F1 asserted when rs field identifies a source operand (1-bit) o F2 asserted when rt field identifies a source operand (1-bit) o MR asserted when instruction reads from the data Memory Unit (1- bit) o MW asserted when instruction writes into the data memory Unit (1- bit) o RW asserted when instruction writes into Unit (1-bit) o DR select destination register number (2-bit) o DO select operand to be written (2-bit) The truth table for the control signals is as follows: Shift Math General Instr SO V MO I M2 F1 F2 MR MW RW DR DO SLL 00 0 XXX X XX 0 1 0 0 1 10 10 SRL 10 0 XXX X XX 0 1 0 0 1 10 10 SRA 11 0 XXX X XX 0 1 0 0 1 10 10 SLLV 00 1 XXX X XX 1 1 0 0 1 10 10 SRLV 10 1 XXX X XX 1 1 0 0 1 10 10

SRAV 11 1 XXX X XX 1 1 0 0 1 10 10 JR XX X XXX X XX 1 0 0 0 0 XX XX JALR XX X XXX X XX 1 0 0 0 1 10 11 ADD XX X 000 0 00 1 1 0 0 1 10 01 SUB XX X 000 1 00 1 1 0 0 1 10 01 AND XX X 100 X 00 1 1 0 0 1 10 01 OR XX X 101 X 00 1 1 0 0 1 10 01 XOR XX X 110 X 00 1 1 0 0 1 10 01 NOR XX X 111 X 00 1 1 0 0 1 10 01 SLT XX X 010 1 00 1 1 0 0 1 10 01 ADDI XX X 000 0 01 1 0 0 0 1 00 01 SLTI XX X 010 1 01 1 0 0 0 1 00 01 ANDI XX X 100 X 10 1 0 0 0 1 00 01 ORI XX X 101 X 10 1 0 0 0 1 00 01 XORI XX X 110 X 10 1 0 0 0 1 00 01 LUI XX X 101 X 11 1 0 0 0 1 00 01 LW XX X 000 0 01 1 0 1 0 1 00 00 SW XX X 000 0 01 1 1 0 1 0 XX XX BLTZ XX X XXX X XX 1 0 0 0 0 XX XX BGEZ XX X XXX X XX 1 0 0 0 0 XX XX BEQ XX X 000 1 00 1 1 0 0 0 XX XX BNE XX X 000 1 00 1 1 0 0 0 XX XX BLEZ XX X 000 1 00 1 1 0 0 0 XX XX BGTZ XX X 000 1 00 1 1 0 0 0 XX XX J XX X XXX X XX 0 0 0 0 0 XX XX JAL XX X XXX X XX 0 0 0 0 1 01 11 Table 2: Truth table for control signals Data Path Layout The 8-bit datapath can be implemented by constructing a 1-bit slice horizontally and stacking 8 1-bit slices vertically. This layout floorplan for the data path is shown in Figure 2. The data flows horizontally from left to right. The control signals run vertically across each slice. You may implement some functional blocks as 8-bits, but make sure you organize it in 1-bit slices that can be pitch-matched to the other functional blocks. Address ALU Op Shift Op MUX Pipeline MUX Pipeline ALU Shifter Pipeline ALU Shifter Pipeline

MUX Pipeline.. ALU Shifter Pipeline Figure 2: Floorplan of the data path File Description The 8-bit register file should be implemented as an 9x8 SRAM. The first 8 registers are user available registers and the last and final register is a special register for Jump and Link instructions You should implement a precharge and a sense amplifier with the SRAM blocks. The register file should have two 8-bit read (output) ports and one 8- bit write (input) port. Input to the register file can come either from the ALU output from the data memory. The register file uses an enable signal for the general task of enabling inputs to and outputs from the register file at the proper time. The register file must be carefully designed to properly implement the enable function. In the read cycle, register file output is passed to the pipeline register when enable is high; otherwise the output should be at high impedance, and goes to the ALU on the next clock cycle. The output should also be at high impedance during the entire write cycle, when the R/W signal is high. In the write cycle, data is stored in the SRAM only when enable is high; otherwise the data from the pipeline register should not be allowed to affect the SRAM register file. To accomplish this, the enable signal should act on the SRAM address decoder outputs. The register file address actually contains two independent 3-bit addresses. Address A should specify the address for data passed to the ALU input A, the address B should specify ALU input B. Address A is also used to specify the address for storing data to the SRAM during a write cycle. Pipeline s Four pipeline registers are needed for the different stages and can be seen in Figure 1. These pipeline registers should be clocked by the master clock. On the rising edge of the master clock the registers should latch on to the output from the previous stage and hold it constant for the next stage. Parallel load D-flip flop registers should be used for these pipeline registers. Clock/Timing Description One master clock is required for the data path. On each rising clock edge the register pipelines latch on to the result of the previous stage. This result is then held in the register for the next stage of the pipeline. This provides a latching effect allowing the combinatorial circuits to perform their operations. The clock frequency should be chosen

such that the worst case operation across all pipeline stages has time to complete. For example, say the worst case operation is the write to SRAM, then the clock frequency should be chosen such that it is greater than the propagation delay of the SRAM write operation. Project Requirements The following cell views for all circuit blocks must be created using Cadence tools and stored in your group project directory: Schematic Symbol Layout Extraction The complete data path cell must pass the following tests and verifications: Functional simulation of all data path functions DRC LVS Input and output signals must be named as follows: Inputs Outputs The following characteristics must be measured and reported: Slowest logic function and propagation delay for that function (ALU) Slowest arithmetic function and propagation delay for that function (ALU) Slowest propagation delay of the data path (ALU+Shifter) (Propagation delays are to be measured from the clock edge to the last output transition) Physical area of the complete data path layout Total number of transistors in the data path (available from the final LVS output) Total, static and dynamic power consumption for a full read and write cycle during an ADD and Shift_Left 2 operation. Circuit Requirements A Manchester carry look-ahead adder must be implemented A barrel shifter must be implemented The register file must be implemented with a multi-port 8x8 SRAM The pipeline registers must be implemented with parallel load DFFR registers Minimum sized transistors can be used, but are not required When possible, existing primitive cells (INV,NAND, etc.) should be used to construct circuit blocks. Any new primitive cells (such as the carry generation circuits and control logic) must be designed at the transistor level. All required functions must be implemented using only the specified control/select

inputs. You must employ hierarchical design where you instantiate lower level cells into higher level circuit block, both in schematic and layout design Buffers should be designed and included wherever fan-out is larger than 5 gates. Layout Requirements Follow all previously established layout guidelines for cell size (pitch), internal routing, input/output pad placement, etc. All designed circuit schematics must be laid out, must pass DRC, and must pass LVS. All functional blocks should be designed using only metal 1 and metal 2, reserving metal 3 for routing chip-level control and data signals in the top level of hierarchy. All cell I/O signals and power traces must be connected to single points in the final cell. i.e., you cannot have multiple ports for VDD, GND, or any I/O signal. Attempt to consistently route metal 1 and metal 2 in perpendicular directions. You may need to twist cells around and stack them in seemingly odd ways to achieve your goals, but always make sure you have good power supply to all cells and that you have good access to run metal 2 and metal 3 in a single direction in your final data path cell. Project Planning and Considerations Your group should decide during the proposal stage exactly what cells you will need to complete your project. Although you can alter this plan slightly if necessary, you should identify early on what functions you will construct at the 1- bit level, at the 4-bit level, and at the full 8-bit level. Your group must decide which tasks of this design project will be completed for each design phase (Labs 8-10). Since layout and final simulation/characterization take much more time than schematic capture and functional simulation, you must get beyond the schematic stage by Lab 9. It is always best to pass DRC and LVS on smaller cells BEFORE using them to construct larger cells. Final LVS can be very time consuming for many groups. Deliverables: All schematic and layout cells required to construct the data path must be within th group directory. Simulations should be performed to verify ALL data path functions, several of which will be tested during the project demonstration. In addition, the following must be provided within the final project report: Data Path 1 Select schematics that demonstrate your design effort 2 Final data path schematic and symbol

3 Truth table of control inputs and ALU/shifter functions 4 Functional simulations, including loading data from memory, perform XOR, A+B, and A- B operations on the data, and saving the result to another register in the register file. 5 Final data path layout and any smaller cell layouts that you feel you did an exceptional job optimizing 6 Discussion of LVS process, what problems did you have, how did you solve them, etc. 7 Post layout simulations showing propagation delays as specified above 8 Table of cell specifications including -Measured cell width, height, and area, number of transistors, and total area/transistor -Measured performance characteristics (worst-case delays and power)