MIPS ISA AND PIPELINING OVERVIEW Appendix A and C

Similar documents
Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

What is Pipelining? RISC remainder (our assumptions)

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

COSC 6385 Computer Architecture - Pipelining

C.1 Introduction. What Is Pipelining? C-2 Appendix C Pipelining: Basic and Intermediate Concepts

Instruction Pipelining Review

Updated Exercises by Diana Franklin

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

Slide Set 8. for ENCM 501 in Winter Steve Norman, PhD, PEng

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Static vs. Dynamic Scheduling

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction

Advanced Computer Architecture

Load1 no Load2 no Add1 Y Sub Reg[F2] Reg[F6] Add2 Y Add Reg[F2] Add1 Add3 no Mult1 Y Mul Reg[F2] Reg[F4] Mult2 Y Div Reg[F6] Mult1

ECEC 355: Pipelining

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Good luck and have fun!

Website for Students VTU NOTES QUESTION PAPERS NEWS RESULTS

Instruction Frequency CPI. Load-store 55% 5. Arithmetic 30% 4. Branch 15% 4

Reduction of Data Hazards Stalls with Dynamic Scheduling So far we have dealt with data hazards in instruction pipelines by:

Multi-cycle Instructions in the Pipeline (Floating Point)

Hardware-based Speculation

CMCS Mohamed Younis CMCS 611, Advanced Computer Architecture 1

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism

CPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts

Complications with long instructions. CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. How slow is slow?

Lecture: Pipeline Wrap-Up and Static ILP

Chapter 4. The Processor

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

INSTRUCTION LEVEL PARALLELISM

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

Control Dependence, Branch Prediction

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

The basic structure of a MIPS floating-point unit

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Instruction Level Parallelism

Lecture 4: Instruction Set Architecture

Unpipelined Machine. Pipelining the Idea. Pipelining Overview. Pipelined Machine. MIPS Unpipelined. Similar to assembly line in a factory

CS433 Homework 2 (Chapter 3)

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions.

EECC551 Exam Review 4 questions out of 6 questions

CPU Architecture and Instruction Sets Chapter 1

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

CPE 631 Lecture 09: Instruction Level Parallelism and Its Dynamic Exploitation

CMSC 411 Practice Exam 1 w/answers. 1. CPU performance Suppose we have the following instruction mix and clock cycles per instruction.

Hardware-Based Speculation

Hardware-based Speculation

Four Steps of Speculative Tomasulo cycle 0

Appendix C. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation

Chapter 4. The Processor

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013

Structure of Computer Systems

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts

Pipeline Review. Review

CO Computer Architecture and Programming Languages CAPL. Lecture 15

Page # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer

As the amount of ILP to exploit grows, control dependences rapidly become the limiting factor.

Chapter 3 Instruction-Level Parallelism and its Exploitation (Part 1)

Pipelining. CSC Friday, November 6, 2015

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Instruction-Level Parallelism and Its Exploitation

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Exploiting ILP with SW Approaches. Aleksandar Milenković, Electrical and Computer Engineering University of Alabama in Huntsville

This course provides an overview of the SH-2 32-bit RISC CPU core used in the popular SH-2 series microcontrollers

Instruction Set Principles and Examples. Appendix B

Execution/Effective address

Pipelining. Maurizio Palesi

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

COSC 6385 Computer Architecture - Pipelining (II)

CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. Complications With Long Instructions

Administrivia. CMSC 411 Computer Systems Architecture Lecture 6. When do MIPS exceptions occur? Review: Exceptions. Answers to HW #1 posted

Processor: Superscalars Dynamic Scheduling

CISC 662 Graduate Computer Architecture Lecture 7 - Multi-cycles

Instruction Pipelining

Instruction Pipelining

Chapter 3: Instruction Level Parallelism (ILP) and its exploitation. Types of dependences

Multiple Instruction Issue. Superscalars

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

5008: Computer Architecture HW#2

COMPUTER ORGANIZATION AND DESIGN

Modern Computer Architecture

DYNAMIC INSTRUCTION SCHEDULING WITH SCOREBOARD

Instruction Set Architecture of. MIPS Processor. MIPS Processor. MIPS Registers (continued) MIPS Registers

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Super Scalar. Kalyan Basu March 21,

COSC4201. Prof. Mokhtar Aboelaze York University

Copyright 2012, Elsevier Inc. All rights reserved.

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Hardware-Based Speculation

CS 1013 Advance Computer Architecture UNIT I

ECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 8 Instruction-Level Parallelism Part 1

Transcription:

1 MIPS ISA AND PIPELINING OVERVIEW Appendix A and C

OUTLINE Review of MIPS ISA Review on Pipelining 2

READING ASSIGNMENT ReadAppendixA ReadAppendixC 3

THEMIPS ISA (A.9) First MIPS in 1985 General-purpose RISC Load-store architecture MIPS provides a good architectural model for study Popular and easy MIPS emphasizes A simple load-store instruction set Design for pipelining efficiency Efficiency as a compiler target Several models MIPS64 4

THEMIPS ISA (A.9) Fixed instruction encoding Supports these addressing modes displacement (offset 12-16 bits) immediate (8-16 bits) register indirect Supports these data sizes and types 8-, 16-, 32-, and 64-bit integers 32-bit and 64-bit IEEE 754 floating-point numbers Supports these simple instructions load, store, add, subtract, move register-register, shift Compare equal, compare not equal, compare less, branch (with a PC-relative address at least 8 bits long), jump, call, and return Registers Integer registers Floating-point registers 5

THEMIPS ISA (A.9) Registers for MIPS64 32 64-bit GPRs or integer registers (R0,, R31) R0 is always 0 32 64-bit FPRs (F0,, F31) Hold single 32-bit single precision value Hold single 64-bit double precision value Special: Exception, PC, FP status Data Types for MIPS64 8-bit bytes, 16-bit half words, 32-bit words, and 64-bit double words for integer data 32-bit single precision and 64-bit double precision for floating point. Data can be loaded into registers with sign or zero extension 6

THEMIPS ISA (A.9) Addressing Modes for MIPS Data Transfers Register addressing Immediate Displacement Direct Memory addressing 64-bit byte addresses Big Endian or little Endian (Mode bit) Aligned memory addressing! 7

THEMIPS ISA (A.9) 8 MIPS Instruction Format

THEMIPS ISA (A.9) MIPS Operations loads and stores, ALU operations, branches and jumps, floating-point 9

THEMIPS ISA (A.9) 10 MIPS Operations

THEMIPS ISA (A.9) MIPS Control Flow Instructions 11

THEMIPS ISA (A.9) MIPS Floating-Point Operations Manipulate the floating-point registers on either single or double precision Single-precision operations ADD.S, SUB.S, MUL.S, DIV.S Double precision operations ADD.D, SUB.D, MUL.D, DIV.D Floating-point compares Set a bit in the special floating-point status register that can be tested with a pair of branches: BC1T and BC1F 12

THEMIPS ISA (A.9) Example unsigned int long x[32]; for (k=0;k<32; k++) { x[k] = x[k] * 54; } ===================================================== LD R1, 1024(R0) // ADDRESS OF X IS IN MEM[1024] DADDU R2, R0, R0 // K = 0 DADDIU R4, R0, 54 // CONSTANT 54 IN R4 L LD R3, 0(R1) // LOAD X[K] MULTDU R3, R3, R4 // X[K]*54 SD R3, 0(R1) // UPDATE X[K] DADDIU R1, R1, 8 // NEXT ELEMENT DADDIU R2, R2, 1 // INCREMENT K SLTIU R5,R2, 32 BNE R5, R0, L 13

PIPELINING Key implementation technique used to make fast CPUs Pipelining is an implementation technique whereby multiple instructions are overlapped in execution to take advantage of parallelism among the actions (steps) needed to execute an instruction Invisible to programmer! Instruction execution is split into steps(stages) Each stage finishes part of the instruction All stages are given the same time to finish (processor cycle!) Improves throughput not latency! The depth of the pipeline determine the speedup Ideally, speedup equals the number of pipeline stages Longer is better, but more complex and expensive! 14

SIMPLE IMPLEMENTATION TORISC ISA Propertiesthat makeiteasy All operations on data apply to data in registers and typically change the entire register The only operations that affect memory are load and store operations Few and fixed size instruction formats 5-stage implementation Instruction fetch cycle ((IF) Instruction decode/register fetch cycle (ID) Execution/effective address cycle (EX) Memory access (MEM) Write-back cycle (WB) Cycles Load5 ALU and Store4 Branch and Jump 2 15

SIMPLE IMPLEMENTATION TORISC ISA Cycles for Unpipelined = 5N Cycles for Pipelined = 5 + N-1 Speed up = 5N/(N-1+5) 16

SIMPLE IMPLEMENTATION TORISC ISA Pipeline registers Separate memories 17

SIMPLE IMPLEMENTATION TORISC ISA Pipelining performance Single Unpipelined Pipelined Cycle time 400 ns 100 ns 100 ns Cycles Per Instruction Instruction mix Time Single =ICx1x400=400IC 4 cycles ALU 2 cycles Branch 5 cycles memory Time Unpipe =ICx(0.5x4+0.2x2+0.3x5)x100=390IC Time Pipe Speedup? 1 =ICx(1)x100=100IC Ideally,speedupis5,but Unbalanced stages Instruction mix Time to fill/empty 50% ALU, 30% MEM, 20% Branch 1 18

PIPELININGHAZARDS Hazards are the occasions in which the next instruction in the stream is prevented from executing during its designated clock cycle Types Structural accesstoonehwunitinthesamecycle Data one instruction requires the result of a previous instruction(s) that are still in the pipeline Control instructions that change the PC Solvinghazards can be done bystallingthe pipeline Conflicting instructions are paused Earlier instructions proceed They reduce the performance of the pipeline! 19

PIPELININGHAZARDS Structural Hazards Solution Duplicate units Pipeline units Cost vs. improvement! 20

PIPELININGHAZARDS Data Hazards Forwarding? Register File? 21

PIPELININGHAZARDS Data Hazards Forwarding does not solve all hazards! 22

PIPELININGHAZARDS Data Hazards Forwarding does not solve all hazards! 23

PIPELININGHAZARDS Control Hazards Theyareofgreaterimpactonperformancethandata! RecallthatifabranchchangesthePCtoitstargetaddress, itisatakenbranch;ifitfallsthrough,itisnottaken,or untaken. Should we fetch the following instruction or the one at the branch target? Wait the branch decision! Stall! Expensive! 24

PIPELININGHAZARDS Control Hazards Treatthebranchasnot taken! Treatthebranchastaken! Static prediction! If prediction is wrong flush the pipeline! 25

PIPELININGHAZARDS Control Hazards Delayed branch! Compiler! 26

PIPELININGHAZARDS Control Hazards Dynamic branch prediction! Use a branch-prediction buffer or branch history table(bht) history table(bht) Thebufferisaddressedbythelowerportion of branch instruction Each location contain bit(s) that tells the predictionofthebranch(1or2bits) The buffer is essentially a small cache shared by all branch instructions Accuracy depends how often the branch of interest is executed and accurate prediction 27

PIPELININGHAZARDS 28 Control Hazards

PIPELININGHAZARDS 29 Control Hazards