The CPU Design Kit: An Instructional Prototyping Platform. for Teaching Processor Design. Anujan Varma, Lampros Kalampoukas

Similar documents
Teaching Computer Architecture with FPGA Soft Processors

Design of a Processor to Support the Teaching of Computer Systems

Laboratory Pipeline MIPS CPU Design (2): 16-bits version

TEACHING COMPUTER ARCHITECTURE THROUGH DESIGN PRACTICE. Guoping Wang 1. INTRODUCTION

Multi Cycle Implementation Scheme for 8 bit Microprocessor by VHDL

The Nios II Family of Configurable Soft-core Processors

Design and Implementation of 5 Stages Pipelined Architecture in 32 Bit RISC Processor

FPGA Implementation of MIPS RISC Processor

ECE332, Week 2, Lecture 3. September 5, 2007

ECE332, Week 2, Lecture 3

RISC Processors and Parallel Processing. Section and 3.3.6

VLSI DESIGN OF REDUCED INSTRUCTION SET COMPUTER PROCESSOR CORE USING VHDL

Design of a Pipelined 32 Bit MIPS Processor with Floating Point Unit

PREFACE. Changes to the SOPC Edition

ENGG3380: Computer Organization and Design Lab5: Microprogrammed Control

ARM ARCHITECTURE. Contents at a glance:

1. Fundamental Concepts

Laboratory Single-Cycle MIPS CPU Design (4): 16-bits version One clock cycle per instruction

Single cycle MIPS data path without Forwarding, Control, or Hazard Unit

VHDL Implementation of a MIPS-32 Pipeline Processor

Application of Power-Management Techniques for Low Power Processor Design

NIOS CPU Based Embedded Computer System on Programmable Chip

Using FPGA for Computer Architecture/Organization Education

Micro-programmed Control Ch 15

Machine Instructions vs. Micro-instructions. Micro-programmed Control Ch 15. Machine Instructions vs. Micro-instructions (2) Hardwired Control (4)

Altera FLEX 8000 Block Diagram

Micro-programmed Control Ch 15

Design and Implementation of a Super Scalar DLX based Microprocessor

Chapter 3 : Control Unit

Knowledge Organiser. Computing. Year 10 Term 1 Hardware

Step-by-Step Design and Simulation of a Simple CPU Architecture

8085 Microprocessor Architecture and Memory Interfacing. Microprocessor and Microcontroller Interfacing

Electronics Engineering, DBACER, Nagpur, Maharashtra, India 5. Electronics Engineering, RGCER, Nagpur, Maharashtra, India.

Course Description: This course includes concepts of instruction set architecture,

6.823 Computer System Architecture Datapath for DLX Problem Set #2

COMPUTER ORGANIZATION (CSE 2021)

Pipelined MIPS processor with cache controller using VHDL implementation for educational purpose

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

signature i-1 signature i instruction j j+1 branch adjustment value "if - path" initial value signature i signature j instruction exit signature j+1

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Laboratory Single-Cycle MIPS CPU Design (3): 16-bits version One clock cycle per instruction

V8-uRISC 8-bit RISC Microprocessor AllianceCORE Facts Core Specifics VAutomation, Inc. Supported Devices/Resources Remaining I/O CLBs

Micro-programmed Control Ch 17

MICROPROGRAMMED CONTROL

The University of Reduced Instruction Set Computer (MARC)

COMPUTER STRUCTURE AND ORGANIZATION

Hardwired Control (4) Micro-programmed Control Ch 17. Micro-programmed Control (3) Machine Instructions vs. Micro-instructions

CO403 Advanced Microprocessors IS860 - High Performance Computing for Security. Basavaraj Talawar,

PART A (22 Marks) 2. a) Briefly write about r's complement and (r-1)'s complement. [8] b) Explain any two ways of adding decimal numbers.

From CISC to RISC. CISC Creates the Anti CISC Revolution. RISC "Philosophy" CISC Limitations

COMPUTER ORGANIZATION (CSE 2021)

Hardware Implementation of GA.

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics

CE1921: COMPUTER ARCHITECTURE SINGLE CYCLE PROCESSOR DESIGN WEEK 2

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Embedded Soc using High Performance Arm Core Processor D.sridhar raja Assistant professor, Dept. of E&I, Bharath university, Chennai

Instruction Pipelining

Advanced Computer Architecture

Computer Architecture

CC312: Computer Organization

Instruction Pipelining

CS 24: INTRODUCTION TO. Spring 2018 Lecture 3 COMPUTING SYSTEMS

Design and Implementation of a FPGA-based Pipelined Microcontroller

Chapter 05: Basic Processing Units Control Unit Design. Lesson 15: Microinstructions

FACTFILE: GCE DIGITAL TECHNOLOGY

Embedded Computing Platform. Architecture and Instruction Set

Class Notes. Dr.C.N.Zhang. Department of Computer Science. University of Regina. Regina, SK, Canada, S4S 0A2

Computer Architecture V Fall Practice Exam Questions

THE OPTIUM MICROPROCESSOR AN FPGA-BASED IMPLEMENTATION

VLIW Digital Signal Processor. Michael Chang. Alison Chen. Candace Hobson. Bill Hodges

ADVANCED COMPUTER ARCHITECTURE TWO MARKS WITH ANSWERS

378: Machine Organization and Assembly Language

Course overview Computer system structure and operation

CPE300: Digital System Architecture and Design

More advanced CPUs. August 4, Howard Huang 1

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ARCHTECTURE

CS150 Project Final Report

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

MARIE: An Introduction to a Simple Computer

Department of Industrial Engineering. Sharif University of Technology. Operational and enterprises systems. Exciting directions in systems

Computer Architecture and Organization:

High Performance Computing

Practical Hardware Debugging: Quick Notes On How to Simulate Altera s Nios II Multiprocessor Systems Using Mentor Graphics ModelSim

FPGA architecture and design technology

Nios Soft Core Embedded Processor

Superscalar Machines. Characteristics of superscalar processors

Chapter 4. MARIE: An Introduction to a Simple Computer. Chapter 4 Objectives. 4.1 Introduction. 4.2 CPU Basics

Dual Port SRAM Based Microcontroller Chip Test Report

omputer Design Concept adao Nakamura

Processing Unit CS206T

CS3350B Computer Architecture Winter 2015

Course web site: teaching/courses/car. Piazza discussion forum:

Final Lecture. A few minutes to wrap up and add some perspective

INTRODUCTION OF MICROPROCESSOR& INTERFACING DEVICES Introduction to Microprocessor Evolutions of Microprocessor

Modern Processors. RISC Architectures

Computer Organization and Design

NIOS CPU Based Embedded Computer System on Programmable Chip

DC57 COMPUTER ORGANIZATION JUNE 2013

Transcription:

The CPU Design Kit: An Instructional Prototyping Platform for Teaching Processor Design Anujan Varma, Lampros Kalampoukas Dimitrios Stiliadis, and Quinn Jacobson Computer Engineering Department University of California Santa Cruz, CA 95064 e-mail: varma@cse.ucsc.edu October 28, 1995

1. Introduction Many universities now oer a course on processor design or computer design as part of the computer engineering curriculum. The objective of this course is to teach the fundamental concepts involved in the design of a Central Processing Unit (CPU) by means of a hands-on project. Typically, the project involves the design, fabrication, and testing of a small CPU within a semester- or quarterlong class. Until a few years ago, these CPU designs were prototyped using bit-slice ALU chips, in combination with microprogrammed control units. This approach limited the scope and exibility of the project; key concepts such as pipelining, common among modern RISC processors, could not be explored. With the availability of dense programmable logic chips such as the Altera FLEX series, it has now become possible to construct more complex designs reecting the state-of-the-art in CPU design within a short time. Because of the complexity of the chips, however, it is still dicult to prototype and debug a CPU design employing multiple FPGA chips within the classroom environment. The CPU Design Kit is a prototyping platform, consisting of both hardware and software, designed to facilitate the construction of such projects within a limited period of time, such as an academic quarter or semester. The CPU Design Kit is designed to allow implementation of a pipelined 32-bit processor with minimal eort on fabrication and testing. The prototyping hardware consists of an ISAbus-based printed-circuit board containing six Altera FLEX EPF81500 programmable logic chips, static RAM chips to implement register le and caches, and hardware support for monitoring and debugging. Software to be made available with the platform include tools to load compiled designs into the FLEX devices, set and display contents of the register le and caches, monitor the state of signals on the board, and control execution of the CPU. 2. Board Architecture and Hardware Figure 1 shows the devices on the prototyping board and Figure 2 illustrates the data and control paths on the board while implementing a 32-bit pipelined processor similar to the DLX described by Hennessy and Patterson [1]. The hardware consists of ve Altera FLEX EPF81500 chips, each intended for implementing specic functional blocks of the CPU design. A sixth FLEX device is used to implement the ISA bus interface of the board to the host PC and to control and monitor the execution of a program on the prototype CPU. Each of the FLEX devices provides approximately 16,000 usable gates and 200 I/O pins. Thus, the board provides a total of 80,000 usable gates for implementing the CPU design. The architecture of the board has been designed to allow implementation of a 32-bit pipelined RISC CPU similar to the DLX, but the board may also be used for other styles of instruction sets. 1

Figure 1: The CPU Design Kit prototyping board. The functional blocks of the CPU design are intended to be partitioned among the ve FLEX devices as follows: 1. Instruction Fetch, Decode, and Branch Unit: These functions are implemented by FPGA- 1 in Figure 2. This module maintains the program counter, handles fetching of instructions from the instruction cache, and implements conditional branches and jumps. 2. Arithmetic Logic Unit: The ALU module is implemented by two FLEX devices, FPGAs 2 and 3 in Figure 2. FPGA-2 is meant to be the primary ALU chip, with FPGA-3 providing additional logic for implementation of logic-intensive functions such as a barrel shifter. 3. Data Memory Interface: FPGA-5 provides the interface to the data cache, and allows routing of data between the ALU/register le and the data cache. 4. Control/Forwarding Unit: The logic needed for forwarding and control of the pipeline is intended to be placed in FPGA-4. Program execution and debugging of the CPU are controlled by logic within the sixth FPGA, which also implements the ISA bus interface. This FPGA can be programmed to control execution of programs by single-stepping, setting up breakpoints, etc. This FPGA also generates the clocks that control all system timing from a 40 MHz master clock fed to it. The clocks are distributed to all the FPGAs via four dedicated lines. In addition to the FPGAs, the board also contains a register le. The register le is a 4-port 4K 32-bit static RAM module. Three of the four ports of the register le are connected to the ALU; in a three-address register architecture such as the DLX, two of the ports are used as read ports and the third as a write port. The unused fourth port is used by the system monitor software to access the register le so that register contents can be set and displayed during debugging. 2

Instruction Cache address(13:0) data(31:0) FPGA 1 Instruction Fetch/Branch Unit portb_address(4:0) portc_address(4:0) portd_address(4:0) Register File portb_data(31:0) portc_data(31:0) ex_control(31:0) frwd_control(31:0) FPGA 2 FPGA 3 FPGA 4 Execution Unit Execution Unit portd_address(4:0) Control/ Forwarding Unit portd_data(31:0) address(13:0) code(15:0) data_out(31:0) FPGA 5 Data Cache address(13:0) data(31:0) Memory/ Write Back Unit mem_data_out(31:0) write_back_data(31:0) Figure 2: Block diagram of the prototyping board. The ve FPGAs shown are used to implement the CPU design. The sixth FPGA, used to implement the ISA-bus interface, as well as control and debugging of the CPU design, is not shown. 3

The memory system of the board consists of separate instruction and data caches. Each cache has a capacity of 16K words (64K bytes). The instruction cache is directly connected to FPGA-1 implementing the instruction-fetch function, and the data cache is interfaced to the ALU/register-le via FPGA-5. The address for data cache is driven by the ALU output; data may be stored into the cache from any of the two register ports feeding the ALU, or loaded from the cache into the third (write) port of the register le. In addition to the logic resources needed to prototype the CPU, the board also contains hardware to facilitate debugging and benchmarking. The sixth EPF81500 FLEX device forms the core of the monitoring hardware and provides the following facilities: 1. Facility to access internal registers and signals within the FPGAs through a dedicated debug bus. The user may dene accessible registers within the design for debugging, which can then be monitored or set via the debug bus. Software tools allow symbolic names to be attached to these registers for convenience. 2. Facility to allow monitoring the state of signals on the board from the PC. This is achieved by implementing a multiplexing function that receives some of the on-board signals on its input and multiplexes them to the ISA-bus interface. This allows \probing" all signals on the board during debugging without accessing them physically. 3. Monitoring of logic events, similar to a logic analyzer, is also achieved by programming the FLEX device to watch for the signal states of interest. Events can be signaled to the interface FPGA by any of the other devices via the global bus. 4. Performance monitoring can also be implemented within this FPGA by implementing counters to keep track of performance metrics such as total cycles of program execution, number of pipeline stall cycles, etc. 3. Software Tools Using the CPU Design Kit to prototype a CPU design eciently requires a variety of software tools. As part of this project, we are currently developing software to provide a complete environment for processor design. Altera's MaxPlus tools already provide an intergated environment for design entry, synthesis, and simulation of the system from a high-level language description of the target hardware. We are supplementing them with tools that are specic to the CPU Design Kit environment. The following are the key tools currently under development: 1. Tools to download the compiled designs into the FLEX devices on the board. 4

Figure 3: Snapshot of Windows-based user interface being developed. 2. Tools to control clocking enable single-stepping, breakpoint execution, etc. 3. Tools to monitor on-board signals and set up event monitors. This provides a logic analyzerlike function. Software will allow attaching labels to signals and displaying them in a variety of formats. 4. Software to read and modify the register le, instruction cache, and data cache. 5. Performance-monitoring tools to set up counters and generate statistics of interest to the user. 6. A Meta-Assembler that allows generation of code for custom instruction sets with minimal eort. Many public-domain tools are already available for this purpose. The user-interface of the software is currently based on Microsoft Windows. Figure 3 shows a snapshot of the user interface being developed. The tools may be ported to the Unix/X Windows platform in the future. 4. Course Organization Because of the tight time constraints in the quarter system at UCSC, we have found it important to organize the project so that students can complete the design task in a sequence of well-dened stages. The CPU Design Kit allows the processor design project to be carried out in phases, each phase implementing a part of the design. For example, assuming a 5-stage pipeline as in the example implementation of Hennessy and Patterson [1], the entire design of the pipelined CPU can 5

Device Utilization (%) Instruction Fetch/ 23.9 Branch Unit Execution Unit 1 57.1 Execution Unit 2 0 Data Memory Interface 7.7 Control/Forwarding Unit 11.6 Table 1: Device utilizations with an example design of a pipelined RISC processor. be structured into four phases as follows: The instruction fetch and branch logic is implemented and tested in the rst phase. This requires design and programming of only one of the ve FPGAs. Only the rst two stages of the pipeline are implemented in this phase. The second phase adds the ALU and register le to the design and expands the pipeline to ve stages. All the ALU instructions are implemented and tested in this phase. Note that all the data hazards introduced by the pipeline are ignored in this phase. Phase 3 involves adding the data memory interface to the design so that load and store instructions can be tested. Finally, the logic necessary to implement all the forwarding requirements is incorporated in Phase 4, thus completing the design. We have found that this organization allows students to accomplish the entire design and implementation within the limited time of one academic quarter. 5. Example Design We have veried the utility of the prototyping board by designing and simulating a complete 5- stage pipelined RISC processor similar to the DLX as described by Hennessy and Patterson [1]. The entire instruction set of DLX was implemented, except for instructions dealing with oating point and exceptions. The logic needed for the design was well below the capacity of the FPGAs. The utilizations of the individual FPGAs are shown in Table 1. The entire logic for implementing the ALU and barrel shifter tted within one of the FPGAs, leaving one of the FPGAs unused. Thus, the board leaves room for attempting substantially more complex designs in the future. 5. Conclusions and Status The CPU Design Kit is a exible instructional prototyping platform tailored for teaching a laboratory course on processor design, using state-of-the-art Altera FLEX programmable logic devices. The board will enable students to design, debug, and benchmark a 32-bit pipelined RISC processor within the limited period of one academic quarter or semester (10{15 weeks). The CPU Design Kit includes a PC-based prototyping board, software tools to facilitate design and debugging of 6

student projects on the board, and instructional material. Testing of the assembled boards is currently in progress and is expected to be complete by the end of July, 1995. Software tools are also currently under development. The system will be used in the processor design class at UCSC in Fall '95 and subsequently made available to other universities if there is interest. References [1] J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach," Morgan Kaufmann, 1995. [2] Altera Corporation, FLEX 8000 Handbook, 1995. 7