INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Similar documents
INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

What is Pipelining? RISC remainder (our assumptions)

EITF20: Computer Architecture Part2.2.1: Pipeline-1

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

EITF20: Computer Architecture Part2.2.1: Pipeline-1

CPS104 Computer Organization and Programming Lecture 19: Pipelining. Robert Wagner

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction

CPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

ECEC 355: Pipelining

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Lecture 5: Pipelining Basics

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

Computer Architecture. Lecture 6.1: Fundamentals of

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr

COSC4201 Pipelining. Prof. Mokhtar Aboelaze York University

Lecture 15: Pipelining. Spring 2018 Jason Tang

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

Chapter 5 (a) Overview

Pipelining. Maurizio Palesi

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Processor Architecture

Pipeline Processors David Rye :: MTRX3700 Pipelining :: Slide 1 of 15

Week 11: Assignment Solutions

CISC Processor Design

ECE 486/586. Computer Architecture. Lecture # 12

COSC 6385 Computer Architecture - Pipelining

Page 1. Pipelining: Its Natural! Chapter 3. Pipelining. Pipelined Laundry Start work ASAP. Sequential Laundry A B C D. 6 PM Midnight

Overview. Appendix A. Pipelining: Its Natural! Sequential Laundry 6 PM Midnight. Pipelined Laundry: Start work ASAP

Pipelining: Overview. CPSC 252 Computer Organization Ellen Walker, Hiram College

Basic Computer Architecture

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Appendix A. Overview

Chapter 8. Pipelining

Updated Exercises by Diana Franklin

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Unpipelined Machine. Pipelining the Idea. Pipelining Overview. Pipelined Machine. MIPS Unpipelined. Similar to assembly line in a factory

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

ECE260: Fundamentals of Computer Engineering

Pipeline Review. Review

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

Computer Systems Architecture Spring 2016

EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts

Pipeline: Introduction

Lecture: Pipelining Basics

Modern Computer Architecture

The von Neumann Architecture. IT 3123 Hardware and Software Concepts. The Instruction Cycle. Registers. LMC Executes a Store.

Introduction to Computer Science Lecture 2: Data Manipulation

ECE 341. Lecture # 15

Pipelining: Hazards Ver. Jan 14, 2014

Instruction Pipelining Review

Computer Architecture

Lecture 6: Pipelining

Advanced Computer Architecture

Background: Pipelining Basics. Instruction Scheduling. Pipelining Details. Idealized Instruction Data-Path. Last week Register allocation

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Module 4c: Pipelining

MC9211Computer Organization. Unit 4 Lesson 1 Processor Design

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Instruction Pipelining

Instruction Pipelining

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Computer Architecture and Organization

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics

What is Pipelining. work is done at each stage. The work is not finished until it has passed through all stages.

Outline Marquette University

Appendix C. Abdullah Muzahid CS 5513

UNIT 3 - Basic Processing Unit

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

Chapter 4. The Processor

/ / / Net Speedup. Percentage of Vectorization

ARM processor organization

Computer Architectures. DLX ISA: Pipelined Implementation

CPU Structure and Function

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions.

Copyright 2012, Elsevier Inc. All rights reserved.

Pipeline Architecture RISC

Pipelining: Basic and Intermediate Concepts

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

CPE300: Digital System Architecture and Design

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!

ECE 154A Introduction to. Fall 2012

Transcription:

UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática Architectures for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 05 Title: Pipelining - Basic Principles Summary: (analysis of the instruction execution, implementation and performance analysis); (structural, data and control). 2010/2011 Nuno.Roma@ist.utl.pt

Architectures for Embedded Computing Pipelining: Basic Principles Prof. Nuno Roma ACE 2010/11 - DEI-IST 1 / 38 Previous Class In the previous class... Code Generation: Types of Assembly Instructions Control Instructions Compilers Role MIPS Logic Architecture Prof. Nuno Roma ACE 2010/11 - DEI-IST 2 / 38

Road Map Prof. Nuno Roma ACE 2010/11 - DEI-IST 3 / 38 Summary Today: : of the instruction execution analysis : Structural Data Control Bibliography: Computer Architecture: a Quantitative Approach, Sections A.1 - A.3 Prof. Nuno Roma ACE 2010/11 - DEI-IST 4 / 38

Prof. Nuno Roma ACE 2010/11 - DEI-IST 5 / 38 The Laundry Analogy for Pipelining Four loads, each one taking 4 30 min = 2 hours, to: Wash; Dry; Fold; Store. Total time: 4 loads 2 hours = 8 hours!!! Prof. Nuno Roma ACE 2010/11 - DEI-IST 6 / 38

The Laundry Analogy for Pipelining Pipeline approach: Total time = 3.5 hours Prof. Nuno Roma ACE 2010/11 - DEI-IST 6 / 38 Execution of an Assembly Instruction Execution phases of an Assembly instruction: 1.(F) Instruction Fetch Read the instruction to an internal register and increment the program counter (PC). Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 38

Execution of an Assembly Instruction Execution phases of an Assembly instruction: 1.(F) Instruction Fetch Read the instruction to an internal register and increment the program counter (PC). 2.(D) Instruction Decode Interpretation of the instruction encoding fields to determine the type of instruction; Copy of the operands to temporary registers. Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 38 Execution of an Assembly Instruction Execution phases of an Assembly instruction: 1.(F) Instruction Fetch Read the instruction to an internal register and increment the program counter (PC). 2.(D) Instruction Decode Interpretation of the instruction encoding fields to determine the type of instruction; Copy of the operands to temporary registers. 3.(X) Execution Computation of the instruction result. Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 38

Execution of an Assembly Instruction Execution phases of an Assembly instruction: 1.(F) Instruction Fetch Read the instruction to an internal register and increment the program counter (PC). 2.(D) Instruction Decode Interpretation of the instruction encoding fields to determine the type of instruction; Copy of the operands to temporary registers. 3.(X) Execution Computation of the instruction result. 4.(W) Write Back Write of the result in the destination specified by the instruction. Prof. Nuno Roma ACE 2010/11 - DEI-IST 7 / 38 Execution in a CISC Processor Structure of a CISC processor: Register Bank PC IR Address Bus Control Unit Op1 Op2 Data Bus ALU Main modules: Res Processing Unit Control Unit; Processing Unit: Register Bank and Arithmetic-Logic-Unit (ALU). Prof. Nuno Roma ACE 2010/11 - DEI-IST 8 / 38

Execution in a CISC Processor Register Bank PC IR Address Bus Control Unit Op1 Op2 Data Bus ALU Res Processing Unit Processing Unit repeatedly used in all phases of the instruction execution process: F D 1 D 2 X 1 X 2 X 3 X 4 W F D 1 D 2 D 3 X 1 W F D 1 X 1 X 2 X 3 W Prof. Nuno Roma ACE 2010/11 - DEI-IST 9 / 38 Execution in a CISC Processor Register Bank PC IR Address Bus Control Unit Op1 Op2 Data Bus ALU Res Processing Unit Processing Unit repeatedly used in all phases of the instruction execution process: F D 1 D 2 X 1 X 2 X 3 X 4 W F D 1 D 2 D 3 X 1 W F D 1 X 1 X 2 X 3 W Instructions may be as complex as necessary; Prof. Nuno Roma ACE 2010/11 - DEI-IST 9 / 38

Execution in a CISC Processor Register Bank PC IR Address Bus Control Unit Op1 Op2 Data Bus ALU Res Processing Unit Processing Unit repeatedly used in all phases of the instruction execution process: F D 1 D 2 X 1 X 2 X 3 X 4 W F D 1 D 2 D 3 X 1 W F D 1 X 1 X 2 X 3 W Instructions may be as complex as necessary; Difficult to parallelize the instruction execution process. Prof. Nuno Roma ACE 2010/11 - DEI-IST 9 / 38 Characteristics of a RISC Processor Characteristics of a RISC Processor: All instructions take the same amount of time to execute; Simple instructions: only those implemented by the ALU; Only immediate and register addressing modes; Assembly instructions with rigid encoding formats. Prof. Nuno Roma ACE 2010/11 - DEI-IST 10 / 38

CISC vs RISC Comparison F CISC RISC Prof. Nuno Roma ACE 2010/11 - DEI-IST 11 / 38 CISC vs RISC Comparison CISC RISC F IR M[PC],PC+=instLen IR M[PC],PC++ D Prof. Nuno Roma ACE 2010/11 - DEI-IST 12 / 38

CISC vs RISC Comparison CISC RISC F IR M[PC],PC+=instLen IR M[PC],PC++ D Several different addressing modes X Only by register or immediate Prof. Nuno Roma ACE 2010/11 - DEI-IST 13 / 38 CISC vs RISC Comparison CISC RISC F IR M[PC],PC+=instLen IR M[PC],PC++ D Several different addressing modes X W Arbitrary sequence of operations in ALU Only by register or immediate Only one operation in ALU Prof. Nuno Roma ACE 2010/11 - DEI-IST 14 / 38

CISC vs RISC Comparison CISC RISC F IR M[PC],PC+=instLen IR M[PC],PC++ D Several different addressing modes X W Arbitrary sequence of operations in ALU Result is written into a register or memory position Only by register or immediate Only one operation in ALU Result is written into a register or memory position Prof. Nuno Roma ACE 2010/11 - DEI-IST 15 / 38 Prof. Nuno Roma ACE 2010/11 - DEI-IST 16 / 38

Processing Phases of MIPS Processor One additional phase for memory read and write: only used by load and store instructions. F - Instruction Fetch D - Instruction Decode X - Execution M - Memory Access W - Write-Back Each phase takes only one clock cycle. Prof. Nuno Roma ACE 2010/11 - DEI-IST 17 / 38 Processing Phases of MIPS Processor One additional phase for memory read and write: only used by load and store instructions. F - Instruction Fetch D - Instruction Decode X - Execution M - Memory Access W - Write-Back Each phase takes only one clock cycle. 1. (F) Fetch IR M[PC],PC PC+4 Prof. Nuno Roma ACE 2010/11 - DEI-IST 17 / 38

Processing Phases of MIPS Processor 2. (D) Instruction Decode Decode the instruction Read operands from the register bank Sign extension of constants Prof. Nuno Roma ACE 2010/11 - DEI-IST 18 / 38 Processing Phases of MIPS Processor 3. (X) Execution ALU operations with 2 registers, ALU operations with 1 register and one constant, Effective address calculation. Prof. Nuno Roma ACE 2010/11 - DEI-IST 19 / 38

Processing Phases of MIPS Processor 3. (X) Execution ALU operations with 2 registers, ALU operations with 1 register and one constant, Effective address calculation. 4. (M) Memory Access If load: read from data memory, If store: write to data memory, Branch resolution. Prof. Nuno Roma ACE 2010/11 - DEI-IST 19 / 38 Processing Phases of MIPS Processor 3. (X) Execution ALU operations with 2 registers, ALU operations with 1 register and one constant, Effective address calculation. 4. (M) Memory Access If load: read from data memory, If store: write to data memory, Branch resolution. 5. (W) Write-Back Write the result in the register bank (either an ALU operation or a load instruction). Prof. Nuno Roma ACE 2010/11 - DEI-IST 19 / 38

MIPS Processor Architecture Each execution phase corresponds to one pipeline stage; Each pipeline stage is characterized by an autonomous processing capability; Intermediate results are stored in registers between the pipeline stages; Processing speed is defined by the slowest pipeline stage. Prof. Nuno Roma ACE 2010/11 - DEI-IST 20 / 38 MIPS Processor Architecture Prof. Nuno Roma ACE 2010/11 - DEI-IST 21 / 38

Prof. Nuno Roma ACE 2010/11 - DEI-IST 22 / 38 Clock Cycle Instruction 1 2 3 4 5 6 7 8 9 i F D X M W i + 1 F D X M W i + 2 F D X M W i + 3 F D X M W i + 4 F D X M W Prof. Nuno Roma ACE 2010/11 - DEI-IST 23 / 38

Clock Cycle Instruction 1 2 3 4 5 6 7 8 9 i F D X M W i + 1 F D X M W i + 2 F D X M W i + 3 F D X M W i + 4 F D X M W All instructions must pass through all pipeline stages, either using it or not! Prof. Nuno Roma ACE 2010/11 - DEI-IST 23 / 38 Prof. Nuno Roma ACE 2010/11 - DEI-IST 24 / 38

Pipeline Speedup pipe = Average Time without Pipeline Average Time with Pipeline = CPI serial T clk serial CPI pipe T clk pipe = CPI serial CPI pipe T clk serial T clk pipe Ideal case: CPI pipe = 1 CPI serial = #stages Speedup pipe = #stages T clk serial T clk pipe Prof. Nuno Roma ACE 2010/11 - DEI-IST 25 / 38 Pipeline Throughput Number of executed instructions per unit of time. That s the parameter we are interested!!! Prof. Nuno Roma ACE 2010/11 - DEI-IST 26 / 38

Pipeline Throughput Number of executed instructions per unit of time. That s the parameter we are interested!!! Latency Amount of time each instruction takes to execute. Latency increases with the introduction of the pipeline... Prof. Nuno Roma ACE 2010/11 - DEI-IST 26 / 38 Prof. Nuno Roma ACE 2010/11 - DEI-IST 27 / 38

Structural The hardware does not support all combinations of instructions that are to be simultaneously executed. Prof. Nuno Roma ACE 2010/11 - DEI-IST 28 / 38 Structural The hardware does not support all combinations of instructions that are to be simultaneously executed. Data Instructions that require data that is still being processed by previous instructions in the pipeline. Prof. Nuno Roma ACE 2010/11 - DEI-IST 28 / 38

Structural The hardware does not support all combinations of instructions that are to be simultaneously executed. Data Instructions that require data that is still being processed by previous instructions in the pipeline. Control Change on the sequence of instructions that is to be executed. Prof. Nuno Roma ACE 2010/11 - DEI-IST 28 / 38 Structural The hardware does not support all combinations of instructions that are to be simultaneously executed. Data Instructions that require data that is still being processed by previous instructions in the pipeline. Control Change on the sequence of instructions that is to be executed. The occurrence of a hazard implies an interruption of the execution of all pipeline stages before the one where the hazard has occurred: Stall. Prof. Nuno Roma ACE 2010/11 - DEI-IST 28 / 38

Example of a Structural Hazard Prof. Nuno Roma ACE 2010/11 - DEI-IST 29 / 38 Example of a Structural Hazard Clock Cycle Instruction 1 2 3 4 5 6 7 8 9 10 LD F D X M W i + 1 F D X M W i + 2 F D X M W i + 3 F S F D X M W i + 4 F D X M W Prof. Nuno Roma ACE 2010/11 - DEI-IST 30 / 38

Example of a Data Hazard Prof. Nuno Roma ACE 2010/11 - DEI-IST 31 / 38 Example of a Data Hazard Clock Cycle Instruction 1 2 3 4 5 6 7 8 9 10 DADD R1,R2,R3 F D X M W DSUB R4,R1,R5 F D S D S D S D X M W AND R6,R1,R7 F S F S F S F D X M W OR R8,R1,R9 F D X M XOR R10,R1,R11 F D X Prof. Nuno Roma ACE 2010/11 - DEI-IST 32 / 38

Example of a Control Hazard Clock Cycle Instruction 1 2 3 4 5 6 7 8 9 10 DADD R1,R2,R3 F D X M W BEQZ R4,ciclo F D X M W AND R6,R7,R8???? Prof. Nuno Roma ACE 2010/11 - DEI-IST 33 / 38 Example of a Control Hazard Prof. Nuno Roma ACE 2010/11 - DEI-IST 34 / 38

Example of a Control Hazard Clock Cycle Instruction 1 2 3 4 5 6 7 8 9 10 DADD R1,R2,R3 F D X M W BEQZ R4,ciclo F D X M W AND R6,R7,R8???? Prof. Nuno Roma ACE 2010/11 - DEI-IST 35 / 38 Example of a Control Hazard Clock Cycle Instruction 1 2 3 4 5 6 7 8 9 10 DADD R1,R2,R3 F D X M W BEQZ R4,ciclo F D X M W AND R6,R7,R8 F S F S F S F D X M W OR R8,R1,R9 F D X M Prof. Nuno Roma ACE 2010/11 - DEI-IST 35 / 38

Real But... Speedup pipe = CPI serial CPI pipe T clk serial T clk pipe CPI serial = #Stages CPI pipe = 1 + #Stalls Speedup pipe = #stages T clk serial T clk pipe 1 1+#Stalls Prof. Nuno Roma ACE 2010/11 - DEI-IST 36 / 38 Prof. Nuno Roma ACE 2010/11 - DEI-IST 37 / 38

of a program execution in a pipeline; Hazards in the pipeline: Structural hazards; Data hazards: Types of hazards; Overcoming the Stalls: By Software; By writing in opposite edges of the clock cycle; By data forwarding. Prof. Nuno Roma ACE 2010/11 - DEI-IST 38 / 38