Laboratory Exercise 6 Pipelined Processors 0.0

Similar documents
CS 351 Exam 2 Mon. 11/2/2015

CS3350B Computer Architecture Quiz 3 March 15, 2018

Processor. Han Wang CS3410, Spring 2012 Computer Science Cornell University. See P&H Chapter , 4.1 4

Anne Bracy CS 3410 Computer Science Cornell University. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

Anne Bracy CS 3410 Computer Science Cornell University. See P&H Chapter: , , Appendix B

Lab 4 Report. Single Cycle Design BAOTUNG C. TRAN EEL4713C

A Processor. Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. See: P&H Chapter , 4.1-3

CSEN 601: Computer System Architecture Summer 2014

The MIPS Instruction Set Architecture

ECE Exam I February 19 th, :00 pm 4:25pm


ICS 233 Computer Architecture & Assembly Language. ICS 233 Computer Architecture & Assembly Language

CSc 256 Midterm 2 Fall 2011

ENCM 369 Winter 2013: Reference Material for Midterm #2 page 1 of 5

MIPS%Assembly% E155%

Midterm. Sticker winners: if you got >= 50 / 67

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

Computer Architecture

EEM 486: Computer Architecture. Lecture 2. MIPS Instruction Set Architecture

Mips Code Examples Peter Rounce

A Processor! Hakim Weatherspoon CS 3410, Spring 2010 Computer Science Cornell University. See: P&H Chapter , 4.1-3

Final Project: MIPS-like Microprocessor

Project Part A: Single Cycle Processor

Computer Architecture Experiment

R-type Instructions. Experiment Introduction. 4.2 Instruction Set Architecture Types of Instructions

Character Is a byte quantity (00~FF or 0~255) ASCII (American Standard Code for Information Interchange) Page 91, Fig. 2.21

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA

NATIONAL UNIVERSITY OF SINGAPORE

EE108B Lecture 3. MIPS Assembly Language II

Computer Architecture. The Language of the Machine

Reduced Instruction Set Computer (RISC)

Instruction Set Principles. (Appendix B)

M2 Instruction Set Architecture

CS3350B Computer Architecture MIPS Instruction Representation

ISA and RISCV. CASS 2018 Lavanya Ramapantulu

Reduced Instruction Set Computer (RISC)

MIPS Reference Guide

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA MIPS ISA. In a CPU. (vonneumann) Processor Organization

ECE232: Hardware Organization and Design. Computer Organization - Previously covered

Computer Organization and Structure

Computer Architecture. MIPS Instruction Set Architecture

Processor (I) - datapath & control. Hwansoo Han

University of Jordan Computer Engineering Department CPE439: Computer Design Lab

Programmable Machines

Programmable Machines

Chapter 4. The Processor. Computer Architecture and IC Design Lab

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

CPS311 - COMPUTER ORGANIZATION. A bit of history

Laboratory Exercise 1: Machine Language Instructions

MIPS ISA. 1. Data and Address Size 8-, 16-, 32-, 64-bit 2. Which instructions does the processor support

CS 4200/5200 Computer Architecture I

MIPS R-format Instructions. Representing Instructions. Hexadecimal. R-format Example. MIPS I-format Example. MIPS I-format Instructions

TSK3000A - Generic Instructions

Concocting an Instruction Set

CSc 256 Midterm 2 Spring 2012

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

EE 109 Unit 8 MIPS Instruction Set

MIPS Instruction Set

F. Appendix 6 MIPS Instruction Reference

Adding Support for jal to Single Cycle Datapath (For More Practice Exercise 5.20)

ECE 473 Computer Architecture and Organization Project: Design of a Five Stage Pipelined MIPS-like Processor Project Team TWO Objectives

CSE 378 Midterm 2/12/10 Sample Solution

Midterm Questions Overview

Question 0. Do not turn this page until you have received the signal to start. (Please fill out the identification section above) Good Luck!

Alexandria University

Introduction to the MIPS. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Course Administration

Procedure Calling. Procedure Calling. Register Usage. 25 September CSE2021 Computer Organization

LSU EE 4720 Homework 1 Solution Due: 12 February 2016

EE 109 Unit 13 MIPS Instruction Set. Instruction Set Architecture (ISA) Components of an ISA INSTRUCTION SET OVERVIEW

MIPS Instruction Format

Q1: /30 Q2: /25 Q3: /45. Total: /100

Pipelined Processor Design

Mark Redekopp, All rights reserved. EE 357 Unit 11 MIPS ISA

Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan)

CS232 Final Exam May 5, 2001

Patrick Murray 4/17/2012 EEL4713 Assignment 5

Single Cycle CPU Design. Mehran Rezaei

Chapter 2. Instructions: Language of the Computer. Adapted by Paulo Lopes

ECE 2035 Programming HW/SW Systems Fall problems, 7 pages Exam Two 23 October 2013

Instruction Set Architecture part 1 (Introduction) Mehran Rezaei

MIPS Instruction Reference

ECE232: Hardware Organization and Design

Instructions: Language of the Computer

INSTRUCTION SET COMPARISONS

5/17/2012. Recap from Last Time. CSE 2021: Computer Organization. The RISC Philosophy. Levels of Programming. Stored Program Computers

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

Recap from Last Time. CSE 2021: Computer Organization. Levels of Programming. The RISC Philosophy 5/19/2011

Instruction Set Architecture of. MIPS Processor. MIPS Processor. MIPS Registers (continued) MIPS Registers

ECE 2035 Programming HW/SW Systems Spring problems, 6 pages Exam One 4 February Your Name (please print clearly)

ELEC / Computer Architecture and Design Fall 2013 Instruction Set Architecture (Chapter 2)

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture

RISC-V Assembly and Binary Notation

Topic Notes: MIPS Instruction Set Architecture

The Evolution of Microprocessors. Per Stenström

bits 5..0 the sub-function of opcode 0, 32 for the add instruction

Review of instruction set architectures

Examples of branch instructions

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Transcription:

Laboratory Exercise 6 Pipelined Processors 0.0 Goals After this laboratory exercise, you should understand the basic principles of how pipelining works, including the problems of data and branch hazards and possible remedies like forwarding and delayed branching. Finally, you should have an understanding of how the instructions are used to control different parts of a data path through a control unit. Literature Patterson and Hennessy: Chapter 6.1 6.6, Appendix C.2 MIPS Lab Environment Reference Manual Preparation Read the literature and this laboratory exercise in detail and solve the home assignments. Home Assignment 1 The control unit in a MIPS processor can be described as a set of simple logical formulas. If you click on the Edit menu and then choose Control unit... an editor window opens, and you can write new control unit logic in a simple language based on JavaScript. Here is a part of the code used for the control unit logic in the simulator: class Control { in Instruction in BranchCondition:1 out BranchControl:2 out RegWrite:1 out RegDst:1 out ALUSrc:1 out NotJal:1 out ALUOp:4 out MemRead:3 // The entire instruction // From branch condition // 0: beq, bne // 1: j, jal // 2: jr, jalr // 3: default // 0: Do not update register, 1: update register // 0: rt dest reg, 1: rd dest reg (default) // 0: R[rt] ALU src, 1: Imm. ALU src // 0: jal, jalr, 1: default // 0: pass (default) // 1: from func field // 2: add // 3: sub // 4: and // 5: or // 6: xor // 7: lui // 8: shift // 9: slt // 0: no read (default) // 1: word // 2: half word, signed // 3: byte, signed // 6: half word, unsigned 1

out MemWrite:2 out MemToReg:1 out UnknownOP:1 // 7: byte, unsigned // 0: No write (default) // 1: word // 2: half word // 3: byte // 0: memory result to reg // 1: ALU result to reg (default) script // Declare variables with default values var VBranchControl; var VRegWrite; var VRegDst; var VALUSrc; var VNotJal; var VALUOp; var VMemRead; var VMemWrite; var VMemToReg; function ALU_Rformat(Func) { switch(func){ case 0x08: // jr VBranchControl = 2; case 0x09: // jalr VBranchControl = 2; VNotJal = 0; case 0x00: // sll case 0x02: // srl case 0x03: // sra case 0x20: // add case 0x22: // sub case 0x24: // and case 0x25: // or case 0x26: // xor case 0x27: // nor case 0x2a: // slt VRegWrite = ; VALUOp = ; default: // unimplemented or unknown instruction function OnInstruction() { // First compute good default values VBranchControl = 3; VRegWrite = 0; 2

VRegDst = 1; VALUSrc = 0; VNotJal = 1; VALUOp = 0; VMemRead = 0; VMemWrite = 0; VMemToReg = 1; // The operator '>>>' means unsigned shift right OpCode = Instruction.Get() >>> 26; Func = Instruction.Get() & 0x3f; // Compute specific instruction values switch(opcode) { case 0x0: // r-format ALU_Rformat(Func); case 0x2: // j VBranchControl = 1; case 0x3: // jal case 0x4: // beq if (BranchCondition.Get() == 1) VBranchControl = 0; case 0x5: // bne case 0x8: // addi VRegWrite = 1; VRegDst = 0; VALUSrc = 1; VALUOp = 2; // add case 0xa: // slti case 0xc: // andi case 0xd: // ori case 0xe: // xori case 0xf: // lui case 0x20: // lb case 0x21: // lh 3

case 0x23: // lw case 0x24: // lbu case 0x25: // lhu case 0x28: // sb case 0x29: // sh case 0x2b: // sw default: // unknown op UnknownOP.Set(1); // Set output values BranchControl.Set(VBranchControl); RegWrite.Set(VRegWrite); RegDst.Set(VRegDst); ALUSrc.Set(VALUSrc); ALUOp.Set(VALUOp); NotJal.Set(VNotJal); MemRead.Set(VMemRead); MemWrite.Set(VMemWrite); MemToReg.Set(VMemToReg); end_script event Instruction OnInstruction() It is only the code part between script and end_script that should be typed into the control unit editor window. The rest of the code is already known by the simulator. The inputs are Instruction and BranchCondition. First, the code defines a set of variables for the outputs to be calculated. Then the function OnInstruction computes the outputs for the instruction, using the function ALU_Rformat as a helper for the register instructions. Finally, the variables are copied to the output signals. Replace all with correct code so that you produce a working control unit logic. The language is based on JavaScript, and you can use all standard Java constructs. The code above already contains all types of expressions needed, though. Instruction Classes Different classes of instructions need to use different parts of the available components in the data path. It is common to group MIPS instructions in four classes: arithmetic, load, store, and branch instructions. The instructions within one such class are quite similar, and it is often enough to understand one of them in order to understand all of them. 4

Assignment 1 The arithmetic instructions are sometimes also called register instructions because they perform an operation with two source registers and one destination register. You will now study how an arithmetic instruction goes through the pipeline. Create a new project, type in the following program and build it. To be able to run it on the pipeline simulator open the program called mipspipe2000.exe. Choose Load Pipeline from the File menu, open the directory called S-script and choose the file called s.dit to load the small version of the pipeline. To open your program choose Open from the File menu or from the toolbar, go to the directory where you saved program, open the directory called Objects and the file of.out type. # Laboratory Exercise 7, Assignment 1 # Written by Jan Eric Larsson, 26 November 1998 start: add t0, t1, t2 Insert some distinct values in t1 and t2 and start to execute the program by single stepping through each pipeline stage. What happens when the instruction goes through the first pipeline stage, the IF stage? Describe all signals, register changes, and other effects in detail. What happens in the second (ID) stage? What happens in the third (EX) stage? What happens in the fourth (MEM) stage? What happens in the fifth (WB) stage? What do IF, ID, EX, MEM and WB mean? How many clock cycles does it take for the result of the operation to be available in the destination register? In which pipeline stages do different arithmetic instructions differ? One stage is not used by arithmetic instructions. Which one? Why? Assignment 2 Replace the instruction add t0, t1, t2 in the program above with the instruction lw t0, 0(t1), build, upload and investigate the program. Note that you must add a data variable from which to load a value. What happens in the different pipeline stages? What arithmetic operation does the ALU perform? Why? How many clock cycles does it take for the destination register to receive its value? Are all pipeline stages used? Explain. 5

Assignment 3 Now investigate the instruction sw t0, 4(t1) in the same way as with the other instructions above. What happens in the different pipeline stages? What arithmetic operation does the ALU perform? Why? Are all pipeline stages used? Explain. Assignment 4 Finally, investigate the instruction beq t0, t1, Dest in the same way as with the other instructions above. Note that you must add a label named Dest somewhere. What happens in the different pipeline stages? What arithmetic operation does the ALU perform? Why? Are all pipeline stages used? Explain. A Small Program Example You will now investigate how a small program with several instructions goes through the pipeline so that the different stages of the instructions are executed in parallel. Study the following program, from Laboratory Exercise 1. # Laboratory Exercise 1, Home Assignment 2 # Written by Jan Eric Larsson, 27 October 1998 # Avoid reordering instructions # Start generating instructions # The label should be globally known # The label marks an entry point start: lui $9, 0xbf90 # Load upper half of port address # Lower half is filled with zeros repeat: lbu $8, 0x0($9) # Read from the input port # Needed after load sb $8, 0x0($9) # Write to the output port b repeat # Repeat the read and write cycle # Needed after branch li $8, 0 # Clear the register # Marks the end of the program Assignment 5 Upload the program above to the pipeline simulator and execute it step by step. Carefully note when the instructions are launched and when their results are ready. Problems with Pipelining Pipelining can make processors run up to N times faster than when they are executed one at a time, where N is the number of pipeline stages. However, there are several effects that cause problems and make it impossible to reach this efficiency in practice. One is that not all instructions use all pipeline stages. You will now investigate a few others. 6

Assignment 6 Run the following program on the pipeline simulator. Assign distinct values to t0, t1 and t3, and single step through the instructions. # Laboratory Exercise 7, Assignment 6 # Written by Jan Eric Larsson, 26 November 1998 start: add add t2, t0, t1 t4, t2, t3 After how many clock cycles will the destination register of the first add instruction, t2, receive the correct result value? After how many clock cycles is the value of t2 needed in the second instruction? What is the problem here? What is this kind of hazard called? This kind of problem can be solved by code reordering, introduction of instructions, stalling the pipeline (hazard detection) and by forwarding. Explain when the first three methods can be used and how they work. Both the hardware MIPS processor and the simulator uses forwarding to solve problems with data hazards. So far, you have used a version of the pipeline, S-script, that does not have forwarding. Switch to the pipeline in the directory Xl-script. This version does have forwarding. Then single step through the program above and study how the forwarding works. How does the forwarding unit detect that forwarding should be used? From where to where is data forwarded in the case above? Assignment 7 Run the following program on the pipeline simulator. Use the simple version without forwarding (S-script). Assign the same value to t0 as to t1 and single step through the instructions. # Laboratory Exercise 7, Assignment 7 # Written by Jan Eric Larsson, 22 February 1999 start: beq t0, t1, start addi t0, t0, 1 7

How many cycles does it take until the branch instruction is ready to jump? What has happened with the following addi instruction while the branch is calculated? What is the problem here? What is this kind of hazard called? What are the possible solutions to this problem? Switch to the forwarding version (Xl-script) again. How does this version handle beq? Assignment 8 Run the following program on the pipeline simulator. Assign distinct values to t0 and t1 and set t2 to contain the address of a memory location where you know the contents. Then single step through the instructions. # Laboratory Exercise 7, Assignment 8 # Written by Jan Eric Larsson, 22 February 1999 start: lw add t0, 0(t2) t1, t1, t0 After how many clock cycles will the destination register of the load instruction, t0, receive the correct result value? After how many clock cycles is the value of t0 needed in the add instruction? What is the problem here? What is this kind of hazard called? This kind of problem can be solved with forwarding or hazard detection and stalling, just as other data hazards, but most MIPS implementations do not have these for load. What are the alternative solutions that can be used? Does the forwarding version of the simulator, Xl-script, handle the problem with delayed load? Assignment 9 The control unit in a MIPS processor can be described as a set of simple logical formulas. If you click on the Edit menu and then choose Control unit... an editor window opens, and you can write new control unit logic in a simple language based on JavaScript. Enter the control unit logic code from Home Assignment 1 in the control unit editor window. Run the simulator and verify that your control unit works correctly by running some of the previous assignments again. 8

Assignment 10 Study the following code. It consists of a loop in which the register t2 works as a loop counter and is counted down to zero. Each time the loop is executed, one word of memory is loaded, incremented by 1 and stored again, and then the same is done with another word. # Laboratory Exercise 7, Assignment 10 # Written by Jan Eric Larsson, 22 February 1999 li t2, 1000 Loop: lw t0, 0(s0) add t0, t0, s1 sw t0, 0(s0) lw t1, 4(s0) add t1, t1, s1 sw t1, 4(s0) subi t2, t2, 1 bne t2, zero, Loop Write a full program with this loop in it, test it on the ordinary simulator and then on the lab processor. The two functions timer_start() and timer_stop() can be used to measure execution time. The function timer_start() tells the timing to start, while timer_stop() stops the timer and returns the time since the start, in microseconds on the lab computer and in clock cycles on the simulator. Use these two functions to time the loop above. Choose a initial counter value for t2 so that the loop takes several milliseconds. Write down how many milliseconds it takes to execute the loop. Assignment 11 The loop shown in Assignment 10 can be written more efficiently. How? Try to reorder to code in the loop so that it becomes more efficient and time the new version of the loop. How much faster can you make it? Conclusions Before you finish the laboratory exercise, think about the questions below: How can a pipelined processor be faster than one without pipelining? What are the special problems that appear in pipelining? How can these problems be solved? Is the lab computer more like S-script or Xl-script? Is there a difference in writing compilers for pipelined processors? Which is faster: straight code or code with many branches? What does RISC mean? What does CISC mean? Is the Pentium processor pipelined? Pentium Pro? Pentium 2? 9