Lab 3: Pipelined MIPS Assigned: Wed., 2/13; Due: Fri., 3/1 (Midnight)

Similar documents
Lab 2: Single-Cycle MIPS Assigned: Fri., 2/1; Due: Fri., 2/15 (Midnight)

Question 0. Do not turn this page until you have received the signal to start. (Please fill out the identification section above) Good Luck!

MIPS Reference Guide

MIPS Instruction Reference

F. Appendix 6 MIPS Instruction Reference

EEM 486: Computer Architecture. Lecture 2. MIPS Instruction Set Architecture

MIPS Instruction Set

Week 10: Assembly Programming

Computer Architecture. The Language of the Machine

ECE Exam I February 19 th, :00 pm 4:25pm

Project Part A: Single Cycle Processor

Q1: /30 Q2: /25 Q3: /45. Total: /100

The MIPS Instruction Set Architecture

RTL Model of a Two-Stage MIPS Processor

Final Project: MIPS-like Microprocessor

MIPS Instruction Format

ECE 2035 Programming HW/SW Systems Fall problems, 7 pages Exam Two 23 October 2013

Midterm. Sticker winners: if you got >= 50 / 67

Programming the processor

M2 Instruction Set Architecture

Assembly Programming

ECE 2035 Programming HW/SW Systems Spring problems, 6 pages Exam One 4 February Your Name (please print clearly)

ECE 154B Spring Project 4. Dual-Issue Superscalar MIPS Processor. Project Checkoff: Friday, June 1 nd, Report Due: Monday, June 4 th, 2018

Lec 10: Assembler. Announcements

SPIM Instruction Set

ECE 2035 Programming HW/SW Systems Fall problems, 6 pages Exam One 19 September 2012

ECE 2035 Programming HW/SW Systems Fall problems, 6 pages Exam One 22 September Your Name (please print clearly) Signed.

ECE 2035 Programming HW/SW Systems Spring problems, 6 pages Exam Two 11 March Your Name (please print) total

CSc 256 Midterm 2 Fall 2011

CSc 256 Midterm 2 Spring 2012

Exam in Computer Engineering

Overview. Introduction to the MIPS ISA. MIPS ISA Overview. Overview (2)

My MIPS. Version 1 10 December Epita systems/security laboratory 2013

TSK3000A - Generic Instructions

ICS 233 COMPUTER ARCHITECTURE. MIPS Processor Design Multicycle Implementation

Adventures in Assembly Land

ECE 2035 Programming HW/SW Systems Fall problems, 6 pages Exam Two 23 October Your Name (please print clearly) Signed.

Computer Architecture. MIPS Instruction Set Architecture

Outline. EEL-4713 Computer Architecture Multipliers and shifters. Deriving requirements of ALU. MIPS arithmetic instructions

CSc 256 Midterm (green) Fall 2018

ECE 15B Computer Organization Spring 2010

ECE 2035 Programming HW/SW Systems Fall problems, 6 pages Exam Two 21 October 2016

ECE 2035 A Programming Hw/Sw Systems Fall problems, 8 pages Final Exam 8 December 2014

ECE 473 Computer Architecture and Organization Project: Design of a Five Stage Pipelined MIPS-like Processor Project Team TWO Objectives

Processor. Han Wang CS3410, Spring 2012 Computer Science Cornell University. See P&H Chapter , 4.1 4

Computer Architecture Experiment

MIPS Assembly Language. Today s Lecture

Today s Lecture. MIPS Assembly Language. Review: What Must be Specified? Review: A Program. Review: MIPS Instruction Formats

Anne Bracy CS 3410 Computer Science Cornell University. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

101 Assembly. ENGR 3410 Computer Architecture Mark L. Chang Fall 2009

The MIPS R2000 Instruction Set

Kernel Registers 0 1. Global Data Pointer. Stack Pointer. Frame Pointer. Return Address.

EE108B Lecture 3. MIPS Assembly Language II

ECE 2035 A Programming Hw/Sw Systems Spring problems, 8 pages Final Exam 29 April 2015

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA

CS152 Computer Architecture and Engineering. Lecture 1 Introduction & MIPS Review Dave Patterson. www-inst.eecs.berkeley.

Lab 3: Simulation and Testing

ece4750-parc-isa.txt

Patrick Murray 4/17/2012 EEL4713 Assignment 5

INSTRUCTION SET COMPARISONS

Arithmetic for Computers

Reduced Instruction Set Computer (RISC)

ECE410 Design Project Spring 2013 Design and Characterization of a CMOS 8-bit pipelined Microprocessor Data Path

software hardware Which is easier to change/design???

Computer Organization & Design

MIPS Assembly Language Programming

CPS311 - COMPUTER ORGANIZATION. A bit of history

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA MIPS ISA. In a CPU. (vonneumann) Processor Organization

CS 61c: Great Ideas in Computer Architecture

ECE 2035 Programming HW/SW Systems Spring problems, 6 pages Exam Three 10 April 2013

Harnessing FPGAs for Computer Architecture Education

CSc 256 Final Fall 2016

Mips Code Examples Peter Rounce

Anne Bracy CS 3410 Computer Science Cornell University. See P&H Chapter: , , Appendix B

Instruction Set Architecture of. MIPS Processor. MIPS Processor. MIPS Registers (continued) MIPS Registers

ECE 2035 A Programming Hw/Sw Systems Fall problems, 8 pages Final Exam 9 December 2015

Digital System Design II

Review: Organization. CS152 Computer Architecture and Engineering Lecture 2. Review of MIPS ISA and Design Concepts

MIPS ISA. 1. Data and Address Size 8-, 16-, 32-, 64-bit 2. Which instructions does the processor support

Recap from Last Time. CSE 2021: Computer Organization. Levels of Programming. The RISC Philosophy 5/19/2011

CS152 Computer Architecture and Engineering. Lecture 1 CS 152 Introduction & MIPS Review John Lazzaro. Dave Patterson

Flow of Control -- Conditional branch instructions

Reduced Instruction Set Computer (RISC)

5/17/2012. Recap from Last Time. CSE 2021: Computer Organization. The RISC Philosophy. Levels of Programming. Stored Program Computers

IMPLEMENTATION MICROPROCCESSOR MIPS IN VHDL LAZARIDIS DIMITRIS.

Examples of branch instructions

Number Systems and Their Representations

ECE 2035 Programming Hw/Sw Systems Fall problems, 10 pages Final Exam 9 December 2013

ECE232: Hardware Organization and Design. Computer Organization - Previously covered

Instruction Set Principles. (Appendix B)

EECS 150 Fall 2012 Checkpoint 1: Pipelined MIPS Processor

Laboratory Exercise 6 Pipelined Processors 0.0

EECS 150 Fall 2013 Checkpoint 1: Pipelined MIPS Processor

CS654 Advanced Computer Architecture. Lec 1 - Introduction

ECE 2035 A Programming Hw/Sw Systems Fall problems, 10 pages Final Exam 14 December 2016

High Performance Computing Lecture 6. Matthew Jacob Indian Institute of Science

Review: MIPS Organization

CS 351 Exam 2 Mon. 11/2/2015

Review. Lecture #9 MIPS Logical & Shift Ops, and Instruction Representation I Logical Operators (1/3) Bitwise Operations

COMP 303 MIPS Processor Design Project 3: Simple Execution Loop

Transcription:

CMU 18-447 Introduction to Computer Architecture Spring 2013 1/7 Lab 3: Pipelined MIPS Assigned: Wed., 2/13; Due: Fri., 3/1 (Midnight) Instructor: Onur Mutlu TAs: Justin Meza, Yoongu Kim, Jason Lin It is comparable to a pipeline which, once filled, has a large output rate no matter what its length. The same is true here. Once the flow is started, the execution rate of the instructions is high in spite of the large number of stages through which they progress. Chapter 14. The Central Processing Unit. Erich Bloch. Planning a Computer System: Project Stretch. IBM. McGraw-Hill, 1962. 1. Introduction In Lab 2, you implemented single-cycle MIPS machines. In this lab, you will implement pipelined MIPS machines. The pipeline must consist of five stages as discussed in class and covered in a textbook (Patterson and Hennessy). Now that multiple instructions can be in-flight at the same time, you must detect dependences between instructions and handle them correctly. In this lab, we will consider only data dependences and ignore control dependences. Specifically, you will implement two pipelined MIPS machines that handle data dependences in two different ways: stalling and forwarding. Both MIPS machines must be correct and synthesizable. Warning. This lab is difficult. We strongly encourage you to get started early. Extra Credit. Top students will receive extra credit (up to 20%) for implementing the highest performing MIPS machines. Only correct and synthesizable implementations will be eligible. 2. Specifications of the MIPS Machine 2.1. Architecture Instruction Set. The machine supports all MIPS instructions specified in Lab 1, excluding those related to division and control-flow: DIV, DIVU, BEQ, BGEZ, BGEZAL, BGTZ, BLEZ, BLTZ, BLTZAL, BNE, J, JAL, JALR, JR. As shown in the following table, there are 39 MIPS instructions that the machine supports. ADD ADDU ADDI ADDIU AND ANDI BEQ BGEZ BGEZAL BGTZ BLEZ BLTZ BLTZAL BNE DIV DIVU J JAL JALR JR LB LBU LH LHU LUI LW MFHI MFLO MTHI MTLO MULT MULTU NOR OR ORI SB SH SLL SLLV SLT SLTI SLTIU SLTU SRA SRAV SRL SRLV SUB SUBU SW SYSCALL XOR XORI System Call Instruction. Terminates the program (same as Lab 2). In addition, for debugging and grading purposes, you must ensure that the contents of the register file are dumped out in the same format as Lab 2. Exceptions. No support (same as Lab 2). Branch Delay Slot. Does not apply: the machine does not support control-flow instructions.

CMU 18-447 Introduction to Computer Architecture Spring 2013 2/7 2.2. Microarchitecture Unlike a single-cycle microarchitecture, a pipelined microarchitecture divides the work required to execute an instruction across multiple cycles. Each cycle corresponds to a stage within a pipeline. The major advantage of a pipelined microarchitecture is that it can execute multiple instructions in parallel: multiple instructions can be in the pipeline at the same time, albeit at different stages. Pipeline Stages. For this lab, you must implement the following five-stage pipeline. Please ensure that there are exactly five stages. Please ensure that each stage does exactly what it is supposed to do (no more, no less). For example, as long as the memory is accessed only during the MEM stage, you are allowed to generate control signals (or perform other bookkeeping activities) for other stages. Later on, for extra credit, you will be allowed to design your own custom pipeline if you want. Stage Handling Data Dependences. Specification 1. IF Instruction fetch 2. ID Instruction decode and register file read 3. EX Execution or memory address calculation 4. MEM Memory access 5. WB Writeback to register file For this lab, you must implement two five-stage pipelines, each of which handles data dependences in a different way. You will submit both of these implementations separately. 1. Stalling. When a data dependence is detected, simply prevent later instructions from entering/progressing through the pipeline. This leads to idle pipeline stages referred to as bubbles. For this lab, your implementation must stall only when necessary. 2. Forwarding. When a data dependence is detected, allow an earlier instruction to send data directly to a later instruction even before the data has been written back into the register file. For this lab, you must forward data into the end of the decode stage. Your implementation is still allowed to stall, but only when stalling cannot be prevented by forwarding data. 3. Lab Resources 3.1. Source Code The source code is available at: /afs/ece/class/ece447/labs/lab3. Do NOT modify any files or folders unless explicitly specified in the list below. Makefile update.sh src/ (NOT PROVIDED; PLEASE USE YOUR OWN FOLDER FROM LAB 2) 447src/ Supplementary source code. multiply coprocessor.v: The multiplier module. regfile.v: An array of 3-ported registers. (A file is an array of registers.) syscall unit.v: The system call unit: exercised only when the syscall instruction is invoked to terminate a program.

CMU 18-447 Introduction to Computer Architecture Spring 2013 3/7 exception unit.v: The exception unit: not exercised. mips mem.v: The memory module. testbench.v: The testbench. 447inputs/ Example test inputs. inputs/ (MODIFIABLE) You are allowed to add your own test inputs. outputs/ (MODIFIABLE) You can direct your outputs to here. 447ncsim/ Config. file for the ncsim tool. ncsim/ (MODIFIABLE) Ignore. If you really want, you can implement a customized config. file. 447xst/ Config. files for the xst tool. (We are synthesizing for a Virtex-5 FPGA from Xilinx.) 447util/ The source code is exactly the same as Lab 2, except for the following three differences. First, we do not provide a src/ folder. Instead, please use your own src/ that you completed for Lab 2. Second, we provide you with a multiplication module: 447src/multiply coprocessor.v. Please use this to implement the multiplication-related instructions. Third, there is a minor change in the 447xst/ folder which you can ignore. 3.2. Software Tools We provide software tools for compiling, simulating, and synthesizing your MIPS machine. In order to use the tools, remotely log into an ECE server (e.g., ece[000-008].ece.cmu.edu from off-campus and ece[009-031].campus.ece.cmu.local from on-campus) and execute the following command in your bash shell. The command will modify your shell s environment variables to set up the correct AFS paths to the tool binaries as well as the licenses for using them. We highly recommend doing the lab on the ECE servers. $ source /afs/ece/class/ece447/bin/setup447 The following is the list of software tools that are involved in this lab. For your benefit, we also provide their technical documents on the course website (they have the correct versions that match with those of the binaries). If you have questions about the tools, please consult the technical documents first. ncvlog: Verilog compiler (Cadence) ncvlog.pdf ncelab: Verilog elaborator (Cadence) ncvlog.pdf ncsim: Verilog simulator (Cadence) ncvlog.pdf simvision: Waveform viewer (Cadence) simvision.pdf, simviscmdref.pdf xst: Verilog synthesizer (Xilinx) xst.pdf Among these, simvision is the only tool that you will directly invoke. All other tools will be automatically invoked by the Makefile that we provide. Since simvision requires a graphical interface, make sure that you enable X11 forwarding when you remotely log into an ECE server (i.e., ssh -X ece000.ece.cmu.edu). 1 1 If you are running Microsoft Windows on your local machine, you need to install an X server that does the actual rendering. Please install both Xming and Xming-fonts from http://www.straightrunning.com/xmingnotes/.

CMU 18-447 Introduction to Computer Architecture Spring 2013 4/7 3.3. Makefile We provide a Makefile that automates the tedious process of compiling, simulating, verifying, and synthesizing your Verilog implementation. Typing make without any targets or options will invoke the help screen. Please read this help screen carefully! $ make The TAs may have to update the supplementary source code in order to fix a bug. Every time you run make for anything, the Makefile also checks for updates by querying the network file system. If the Makefile discovers any updates, it will alert you by flashing an annoying message on your terminal. This message will continue to appear until you run make update to actually apply the updates. $ make update Within your current working directory, the updates will patch only the files that you were not allowed to modify. On the other hand, the files/folders you were allowed to modify will not be touched. 4. Getting Started & Tips 4.1. Getting Started 1. Review the material on pipelined machines from the lecture notes and the textbook (Patterson and Hennessy). 2. Sketch a high-level block diagram of how you want to organize your pipelined machine. Refine this block diagram as you make progress on your implementation. This block diagram will be collected. 3. Copy your src/ folder from Lab 2 into the Lab 3 directory. 4. Implement just enough of the pipeline so that the machine can execute the following stream of instructions where there are no back-to-back instructions: ADDIU, NOP, NOP, NOP, NOP, ADDIU, NOP, NOP, NOP, NOP,... While there is no NOP machine instruction defined in the MIPS ISA per se, the SPIM assembler happens to convert a nop assembly instruction into a particular OR machine instruction (0x000025: $r0 $r0 $r0). While it is certainly not required, for your debugging purposes, we recommend that your machine recognize this particular machine instruction as a NOP. Your machine should then execute this instruction by doing nothing except for incrementing the program counter. This is a convenient way of forcing an artificial bubble into your pipeline. Create a test input called inputs/addiu nop.s which is the same as 447inputs/addiu.s, except that four nop assembly instructions are inserted after every addiu assembly instruction. Simulate this test input to check whether your rudimentary pipeline is correct. 5. Implement just enough of the pipeline so that the machine can execute the following stream of instructions where there are no data dependences: ADDIU, ADDIU, ADDIU,... Create a test input called inputs/addiu addiu.s, which is the same as 447inputs/addiu.s, except that no register is ever read by a later instruction if the register had been written to by an earlier instruction (i.e., no RAW data dependences). Simulate this test input to check whether your rudimentary pipeline is correct. 6. Augment your pipeline with logic for detecting data dependence and stalling so that the machine can execute the following stream of instructions where there are data dependences: ADDIU, ADDIU, ADDIU,...

CMU 18-447 Introduction to Computer Architecture Spring 2013 5/7 Simulate 447inputs/addiu.s to check whether your rudimentary pipeline is correct. 7. Once you have completely implemented the stalling version of your pipeline, use it a basis for implementing the forwarding version. Remember, you will submit both versions. 4.2. Tips Read this handout in detail. Please ask questions to the TAs using the online Q&A forum. (The link is available on the course website.) When you encounter a technical problem, please read the logs/reports generated by the software tools in the outputs/ folder. Your Verilog code will require many wires. Please adopt a consistent scheme for naming them. We provide a multiplier module 447src/multiply coprocessor.v that you can instantiate to implement the multiplication-related instructions: MFHI, MFLO, MTHI, MTLO, MULT, MULTU. You will not be implementing division-related instructions. The system call instruction should terminate the program only when all other preceding instructions have completed execution. When coding in Verilog, we recommend having a separate module for each pipeline stage (maybe even in separate files). This way, it is easy to draw a separate block diagram for each module. Make sure your Verilog implementation is synthesizable. For debugging and grading purposes, the system call instruction must dump out the contents of the register file in the same format as Lab 2. Since the $display() Verilog function is not synthesizable, you must include synthesis translate off and synthesis translate on (in commented form) before and after the code block that includes $display(). Please refer to the source code from Lab 2 to see how it is done. When synthesizing your implementation, check for warnings that mention the word latch. This is most likely due to a bug in your Verilog code (e.g., incomplete case or if statements). The cycle time of your implementation is determined by the critical path of the slowest stage. While a short cycle time is not necessary to achieve a perfect score on this lab, you should still try to keep your cycle time as short as possible. Keep in mind that cycle time is not the whole story. Forwarding paths may actually increase your cycle time, but also improve overall performance (by reducing stalls and decreasing CPI). Finally, its important to make your processor work first, and then make it work fast. Premature optimization is the root of all evil. 5. Submission 5.1. Block Diagram Please submit two hardcopies of the computer-drawn diagram of your MIPS machine: one for the stalling version and the other for the forwarding version. The TAs will collect these diagrams during the Lab Sections and you will be responsible for explaining your design decisions to the TAs. The diagrams should be at the same level of detail as you saw in the textbook and lecture notes. All major structures (e.g., registers, muxes) should be drawn, as well as boxes for the various control logic blocks. Label all wires with their names and widths. We suggest using different colors (or line styles) to differentiate control- and data-path wires. Putting in an extra effort to keep this diagram neat and clean will definitely pay off. It is okay to utilize plenty of white space and to span multiple sheets of paper. We recommend using Inkscape (cross-platform), Adobe Illustrator (Windows/Mac), or Microsoft Visio (Windows/Mac).

CMU 18-447 Introduction to Computer Architecture Spring 2013 6/7 5.2. Lab Section Checkoff So that the TAs can check you off, please come to any of the lab sections before Sat., 3/9. Note that you must be checked off for both versions of your implementation (stalling and forwarding) and also the extra credit if you choose to submit it. You can get them checked off separately in different lab sections or all in one sitting. Please come early during the lab section. During the Lab Section, the TAs may ask you: to answer questions about your implementations, to simulate your implementation using various test inputs (some of which you may have not seen before), to synthesize your implementation. 5.3. Source Code Make sure that your source code is readable and documented. Please submit the lab by executing the following commands. Stalling Version. $ cp -r src /afs/ece/class/ece447/handin/lab3/andrewid/stalling $ cp -r inputs /afs/ece/class/ece447/handin/lab3/andrewid/stalling Forwarding Version. $ cp -r src /afs/ece/class/ece447/handin/lab3/andrewid/forwarding $ cp -r inputs /afs/ece/class/ece447/handin/lab3/andrewid/forwarding 5.4. README In addition, please submit two README.txt files. To submit these files, execute the following command. $ cp README.txt /afs/ece/class/ece447/handin/lab3/andrewid/stalling/readme.txt $ cp README.txt /afs/ece/class/ece447/handin/lab3/andrewid/forwarding/readme.txt The README.txt file must contain the following two pieces of information. 1. A high-level description of your design. 2. A high-level description of the critical-path of your synthesized implementation. It may also contain information about any additional aspect of your lab. 5.5. Late Days We will write-lock the handin directories at midnight on the due date. For late submissions, please send an email to 447-instructors@ece.cmu.edu with tarballs of what you would have submitted to the handin directory. $ tar cvzf lab3 stalling andrewid.tar.gz src inputs README.txt $ tar cvzf lab3 forwarding andrewid.tar.gz src inputs README.txt Remember, you have only 5 late lab days for the entire semester (applies to only lab submissions, not to homeworks, not to anything else). If we receive the tarball within 24 hours, we will deduct 1 late

CMU 18-447 Introduction to Computer Architecture Spring 2013 7/7 lab day. If we receive the tarball within 24 to 48 hours, we will deduct 2 late lab days... During this time, you may send updated versions of the tarballs (but try not to send too many). However, once a tarball is received, it will immediately invalidate a previous tarball you may have sent. You may not take it back. We will take your very last tarball to calculate how many late lab days to deduct. If we don t hear from you at all, we won t deduct any late lab days, but you will receive a 0 score for the lab. 6. Extra Credit For extra credit on this lab, we will hold a performance competition. Among all implementations that are correct and synthesizable, the top 2 students that have the lowest execution time for an undisclosed set of test inputs will receive up to 20% additional credit for this lab (equivalent to 1% additional credit for the course) as well as prizes (at the discretion of the instructor). You may choose to submit the stalling or the forwarding version of the pipeline without making any effort to optimize it further. All of the guidelines for Lab 3 specified in this handout also apply to the extra credit, except for the following differences. As long as it is correct and synthesizable, there are no restrictions on the pipeline. It may have an arbitrary number of stages and an arbitrary way of handling data dependences. To make the competition more interesting, you are not required to implement the multiplier module which is likely to be on the critical path no matter what. We will not run the extra credit with any test input that includes multiplication-related instructions: MFHI, MFLO, MTHI, MTLO, MULT, MULTU. The README.txt must include the following two additional pieces of information. 1. How you have defined your pipeline stages (if different from the reference). 2. Description of the effort you invested (if any) to improve performance. Submission path: /afs/ece/class/ece447/handin/lab3/andrewid/extra Tarball (for late submissions): lab3 extra andrewid.tar.gz For late submissions, we must receive all three tarballs (stalling, forwarding, extra credit) on the same day. If not, we will deduct the lab late days for whichever tarball we received last. 2 The instructor reserves all rights for the precise definition of the word top.