ECE 154B Spring Project 4. Dual-Issue Superscalar MIPS Processor. Project Checkoff: Friday, June 1 nd, Report Due: Monday, June 4 th, 2018

Similar documents
Computer Architecture Experiment

Multiple Instruction Issue. Superscalars

1 Hazards COMP2611 Fall 2015 Pipelined Processor

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

Project Part A: Single Cycle Processor

In-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

COMPUTER ORGANIZATION AND DESI

CS 351 Exam 2 Mon. 11/2/2015

EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts

Static, multiple-issue (superscaler) pipelines

Processor (IV) - advanced ILP. Hwansoo Han

LECTURE 3: THE PROCESSOR

ECEC 355: Pipelining

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

Patrick Murray 4/17/2012 EEL4713 Assignment 5

ECE 473 Computer Architecture and Organization Project: Design of a Five Stage Pipelined MIPS-like Processor Project Team TWO Objectives

Arithmetic for Computers

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Pipelining. CSC Friday, November 6, 2015

Pipelined CPUs. Study Chapter 4 of Text. Where are the registers?

The Processor: Instruction-Level Parallelism

Outline. EEL-4713 Computer Architecture Multipliers and shifters. Deriving requirements of ALU. MIPS arithmetic instructions

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA

Number Systems and Their Representations

ECE/CS 552: Pipeline Hazards

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Advanced Instruction-Level Parallelism

Chapter 4. The Processor

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA MIPS ISA. In a CPU. (vonneumann) Processor Organization

Computer Organization and Structure

Homework 5. Start date: March 24 Due date: 11:59PM on April 10, Monday night. CSCI 402: Computer Architectures

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

L19 Pipelined CPU I 1. Where are the registers? Study Chapter 6 of Text. Pipelined CPUs. Comp 411 Fall /07/07

Chapter 7. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 7 <1>

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?

CS2100 Computer Organisation Tutorial #10: Pipelining Answers to Selected Questions

ELE 655 Microprocessor System Design

Instruction Level Parallelism

Chapter 7. Microarchitecture. Copyright 2013 Elsevier Inc. All rights reserved.

Chapter 3. Instructions:

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

/ : Computer Architecture and Design Fall Midterm Exam October 16, Name: ID #:

Final Project: MIPS-like Microprocessor

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 14 Instruction Level Parallelism and Superscalar Processors

Please state clearly any assumptions you make in solving the following problems.

LECTURE 10. Pipelining: Advanced ILP

COMPUTER ORGANIZATION AND DESIGN

Instruction-Level Parallelism Dynamic Branch Prediction. Reducing Branch Penalties

ICS 233 COMPUTER ARCHITECTURE. MIPS Processor Design Multicycle Implementation

MIPS Instruction Reference

DEE 1053 Computer Organization Lecture 6: Pipelining

1 /10 2 /16 3 /18 4 /15 5 /20 6 /9 7 /12

COMPUTER ORGANIZATION AND DESIGN

CS146 Computer Architecture. Fall Midterm Exam

Control Hazards. Branch Prediction

Materials: 1. Projectable Version of Diagrams 2. MIPS Simulation 3. Code for Lab 5 - part 1 to demonstrate using microprogramming

CS 61C: Great Ideas in Computer Architecture. Lecture 13: Pipelining. Krste Asanović & Randy Katz

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

Programming the processor

CENG 3420 Lecture 06: Datapath

Materials: 1. Projectable Version of Diagrams 2. MIPS Simulation 3. Code for Lab 5 - part 1 to demonstrate using microprogramming

Suggested Readings! Recap: Pipelining improves throughput! Processor comparison! Lecture 17" Short Pipelining Review! ! Readings!

Complex Pipelines and Branch Prediction

Photo David Wright STEVEN R. BAGLEY PIPELINES AND ILP

Full Datapath. Chapter 4 The Processor 2

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Lab 3: Pipelined MIPS Assigned: Wed., 2/13; Due: Fri., 3/1 (Midnight)

CS Computer Architecture Spring Week 10: Chapter

EECS150 - Digital Design Lecture 9 Project Introduction (I), Serial I/O. Announcements

F. Appendix 6 MIPS Instruction Reference

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

CPU Design for Computer Integrated Experiment

CS 230 Practice Final Exam & Actual Take-home Question. Part I: Assembly and Machine Languages (22 pts)

Thomas Polzer Institut für Technische Informatik

Chapter 4 The Processor 1. Chapter 4D. The Processor

(Refer Slide Time: 00:02:04)

Computer Architecture. MIPS Instruction Set Architecture

Orange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining

EECS150 - Digital Design Lecture 08 - Project Introduction Part 1

Review of instruction set architectures

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

CS 61C: Great Ideas in Computer Architecture. Multiple Instruction Issue, Virtual Memory Introduction

Multi-cycle Instructions in the Pipeline (Floating Point)

CS146: Computer Architecture Spring 2004 Homework #2 Due March 10, 2003 (Wednesday) Evening

Four Steps of Speculative Tomasulo cycle 0

Assembly Programming

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Week 10: Assembly Programming

Computer Architecture EE 4720 Midterm Examination

Instruction Set Principles. (Appendix B)

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add;

CS 152 Computer Architecture and Engineering

CPE 335 Computer Organization. MIPS Arithmetic Part I. Content from Chapter 3 and Appendix B

Chapter 4. The Processor

Final Exam Fall 2007

Transcription:

Project 4 Dual-Issue Superscalar MIPS Processor Project Checkoff: Friday, June 1 nd, 2018 Report Due: Monday, June 4 th, 2018 Overview: Some machines go beyond pipelining and execute more than one instruction per cycle. How is this practical? In class, we learned about superscalar processors. The basic idea is that if we duplicate some of the functional parts of the processor and provide logic to issue several instructions concurrently, the resultant CPI will be effectively less than one. There are two general approaches to multiple issues: static multiple issue with the issue scheduling performed at a compile time and dynamic multiple issue with the issue scheduling performed in a hardware (also known as superscalar). In this lab, you will implement a superscalar dual-issue processor with in-order scheduling. Similar to the original Pentium processor, you will design 5-stage pipeline and similar to Cortex A8 processor (which, e.g. used in iphone 4) it will be based on RISC type instruction set architecture. The idea is that your 5-stage pipeline processor has simple memory hierarchy and branch predictors that you developed in the previous labs; however, in lab4 your processor will be capable of running two instructions per cycle! Fig. 1: Intel 80586 (P5) with and Cortex A8 processors Page 1 of 5

Dynamic (out-of-order) scheduling is a feature of all current x86 architectures. For such processor, the dependencies are handled by complicated control logic like Tomasulo algorithm considered in class. In this lab, you should implement in-order scheduling, just like you did in lab1, which is representative of the earlier ARM and MIPS processors. Handling hazards are more straightforward in case of static in-order scheduling; however, the number of all possible dependencies that your hazard logic should handle could be quite large (and you may appreciate on why it might be conceptually simpler to handle all dependencies in a centralized table like in scoreboard technique). A good approach is to make a complete list of dependencies and come up with a solution (e.g., by stalling the processor or forwarding data) for each case. Then you will apply your techniques to your current architecture. Hints: Before designing, you should consider a couple of points which will help you to design the lab. The timing diagram of a dual issue pipeline processor (in the ideal case) is shown in Figure 2. Note that here, "ideal" means that there is no stalling and the processor is running with its maximum throughput of two instructions per cycle. In your processor, stalling will limit the performance significantly. Fig. 2: The timing diagram of a dual issue MIPS Page 2 of 5

For the first lab, we consider variety of instructions such as add, addu, addi, addiu, sub, subu, and, or, xor, xnor, andi, ori, xori, slt, sltu, slti, sltiu, lw, sw, lui, j, bne, beq, mult, multu. For this lab, mult and multu instructions are considered for the extra credit questions. Hence, you may ignore these instructions in designing this lab. Figure 3 shows the simplified architecture of a dual issue processor. Some of the blocks and signals are not shown for clarity. There are a couple of questions which you may ask yourself before designing this processor. In particular, you have to know what is involved in the fetching of two instructions per cycle, decoding two instructions per cycle, executing two ALU operations per cycle, accessing the data cache twice per cycle and writing back two results per cycle. Fig. 3: 5-stage dual issue pipeline For two-way-wide fetching, the problem is easy when we don't have a cache in our processor, but handling branch instructions are a bit tricky. Basically, at each clock cycle, two instructions (64-bit) must be read from the instruction memory. The first step is to make it a dual port ROM. For branch prediction, you may access branch target buffer in parallel (you have to modify its structure), or you may come up with any other solution for that. If the first instruction or both the first and second ones are predicted not taken, then it is relatively straightforward. If the second instruction is taken; then, you have to provide next PC with an appropriate target address. But, what if the first one is predicted taken? In this case, you may want to discard the second instruction by inserting NOP (though executing it may also have some benefits). Using early branch, in the decode stage, you will decide whether to flush the next instructions or not. Taking care of mispredictions Page 3 of 5

should be also straightforward as you have already done it for a single issue processor. In wide decoding, it is easy in our case because the instruction length is fixed. The register file is now addressable through four ports. You may assume some signals are also bypassed or signed extended to EX stage. Obviously, you have to design another control unit to manage the second instruction. The problem which may arise here is managing hazards. Doubling number of issues quadruples required stall logic because you have two instructions in decode stage and two instructions in every other stage. You have to make a list of all possible hazards. A very important step would be to generalize the ideas that you applied for a single issue processor, i.e. forwarding and stalling, to take care of all possible hazards. For execution stage, memory, and write-back stages, we will simplify the problem by stalling the processor whenever there is a structural hazard. For simplicity, we can assume that data memory can only process one instruction in a cycle. In addition, you may assume some other rules to simplify your design. For instance, if the older instruction (first one) stalls, then the younger one has to stall and cannot bypass it. On the other hand, if the younger instruction stalls, the older instruction from the next group may or may not move up. You may assume rigid pipe (the next instruction doesn't move up) for simplicity. Executing two instructions per cycle is also double by considering two ALUs and enough number of bypass logic for that. In order to design your processor successfully before the deadline, I strongly recommend you to follow this step by step routine. Step1: Redesign your architecture to support double issuing. At this stage, you don't need to consider hazards, branches... Step2: Test your existing design by running a simple two consecutive ADD instructions. Step3: If the design passes the second step, you may increase the number of instructions to test the pipeline, but yet, consider only simple instructions and not hazards. Step4: Sketch a complete diagram of your design and identify all possible hazards. You could do it even earlier Step5: Now, you may take care of hazards by either stalling or forwarding. There will be lots of muxes and wires; therefore, you may want to implement it step by step and Page 4 of 5

as clear as possible. Your diagram helps you to follow every detail. Remember that the basic idea is the same as the single issue. Step6: Finally, you can take care of other details such as branch predictions. FAQs: 1. Grading: Your grade is mainly based on the correct operation. Functionality is the primary goal. 60% of your score is and the rest of is related to your report. 2. What happens during the checkoff? You have 10 minutes to present your project. Both of group members must be available during the presentation. You may bring your own laptop or use computers in ECI lab. Everybody has to explain the whole project and answer some questions. 3. What to turn in? Submit an organized Zip file containing mentioned files to zfahimi@ucsb.edu by the deadline. Your report is very important. Start with an introduction, an illustration of instructions, and design methodology. And then you may focus on each of the steps provided in the manual. When describing each step, provide the code, the test bench and, waveforms. Explain why your waveforms are correct and answer all questions. Organization and completeness of the report determine 40% of your score. Figures should be readable and you have to explain them in detail. Mention how many hours you have spent on this lab, your common mistakes in Verilog coding and lessons you learned. Finally, provide a conclusion and wrap up the project. Cite appropriately any references in the report. A folder containing the project files including all source files, test benches and waveforms. Please heavily comment your code. Poorly commented codes will not be graded. Page 5 of 5