DLX Unpipelined Implementation

Similar documents
What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

COSC 6385 Computer Architecture - Pipelining

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

ECE154A Introduction to Computer Architecture. Homework 4 solution

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

COSC4201 Pipelining. Prof. Mokhtar Aboelaze York University

Pipelining: Hazards Ver. Jan 14, 2014

Instruction Pipelining

Pipelining. Maurizio Palesi

ECEC 355: Pipelining

ECE473 Computer Architecture and Organization. Pipeline: Data Hazards

Pipeline Review. Review

Instruction Pipelining

CS422 Computer Architecture

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Computer Architecture

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Appendix C: Pipelining: Basic and Intermediate Concepts

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

What is Pipelining? RISC remainder (our assumptions)

Lecture 5: Pipelining Basics

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

ECS 154B Computer Architecture II Spring 2009

第三章 Instruction-Level Parallelism and Its Dynamic Exploitation. 陈文智 浙江大学计算机学院 2014 年 10 月

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

Modern Computer Architecture

Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Pipelining. Each step does a small fraction of the job All steps ideally operate concurrently

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!

Chapter 4 The Processor 1. Chapter 4A. The Processor

ECE331: Hardware Organization and Design

ECE232: Hardware Organization and Design

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Designing a Pipelined CPU

Very Simple MIPS Implementation

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Instruction Pipelining Review

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Computer Architecture Spring 2016

Appendix C. Instructor: Josep Torrellas CS433. Copyright Josep Torrellas 1999, 2001, 2002,

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

Basic Pipelining Concepts

Page 1. Pipelining: Its Natural! Chapter 3. Pipelining. Pipelined Laundry Start work ASAP. Sequential Laundry A B C D. 6 PM Midnight

Pipelined Processor Design

Appendix C. Abdullah Muzahid CS 5513

LECTURE 3: THE PROCESSOR

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction

HY425 Lecture 05: Branch Prediction

EE2011 Computer Organization Lecture 10: Enhancing Performance with Pipelining ~ Pipelined Datapath

EITF20: Computer Architecture Part2.2.1: Pipeline-1

MIPS An ISA for Pipelining

Lecture 4: Advanced Pipelines. Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

ELE 655 Microprocessor System Design

Processor (II) - pipelining. Hwansoo Han

Very Simple MIPS Implementation

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

ECE/CS 552: Pipeline Hazards

ECE 2300 Digital Logic & Computer Organization. Caches

C.1 Introduction. What Is Pipelining? C-2 Appendix C Pipelining: Basic and Intermediate Concepts

Outline Marquette University

Advanced Computer Architecture Pipelining

Outline. Pipelining basics The Basic Pipeline for DLX & MIPS Pipeline hazards. Handling exceptions Multi-cycle operations

Full Datapath. Chapter 4 The Processor 2

Appendix A. Overview

CPU Pipelining Issues

EE557--FALL 1999 MIDTERM 1. Closed books, closed notes

mywbut.com Pipelining

Pipelined Processor Design

CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. Complications With Long Instructions

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

CPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts

CSEE 3827: Fundamentals of Computer Systems

Lecture 2: Pipelining Basics. Today: chapter 1 wrap-up, basic pipelining implementation (Sections A.1 - A.4)

EITF20: Computer Architecture Part2.2.1: Pipeline-1

zhandling Data Hazards The objectives of this module are to discuss how data hazards are handled in general and also in the MIPS architecture.

Overview. Appendix A. Pipelining: Its Natural! Sequential Laundry 6 PM Midnight. Pipelined Laundry: Start work ASAP

Chapter 3. Pipelining. EE511 In-Cheol Park, KAIST

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

Lecture 7 Pipelining. Peng Liu.

09-1 Multicycle Pipeline Operations

RISC Pipeline. Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. See: P&H Chapter 4.6

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...

14:332:331 Pipelined Datapath

ECE473 Computer Architecture and Organization. Pipeline: Control Hazard

Parallelism via Multithreaded and Multicore CPUs. Bradley Dutton March 29, 2010 ELEC 6200

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Pipelining: Basic and Intermediate Concepts

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

COMP2611: Computer Organization. The Pipelined Processor

Unpipelined Machine. Pipelining the Idea. Pipelining Overview. Pipelined Machine. MIPS Unpipelined. Similar to assembly line in a factory

ECE260: Fundamentals of Computer Engineering

Transcription:

LECTURE - 06

DLX Unpipelined Implementation Five cycles: IF, ID, EX, MEM, WB Branch and store instructions: 4 cycles only What is the CPI? F branch 0.12, F store 0.05 CPI0.1740.83550.174.83 Further reduction in CPI (without pipelining) ALU instructions can finish in 4 cycles too F ALU 0.47 CPI4.830.474.36 Speedup4.834.361.1

Some Remarks Any further reduction in CPI will likely increase cycle time Some hardware redundancies can be eliminated Use ALU for (PC+4) addition also Same I-cache and D-cache These are minor improvements... An alternative single-cycle implementation: Variation in amount of work ==> higher cycle time Hardware unit reuse is not possible

The Basic Pipeline for DLX CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 I IF ID EX MEM WB I+1 IF ID EX MEM WB I+2 IF ID EX MEM WB I+3 IF ID EX MEM WB I+4 IF ID EX MEM WB That is it? Complications: Resource conflicts, Register conflicts, Branch instructions Exceptions, Instruction set issues

The Pipelined Data-path 4 PC Add Instrn. mem m u x IF/ID Reg. File Sign ext. ID/EX Zero? m u x m u x ALU EX/MEM Data mem MEM/WB m u x Pipeline registers, or pipeline latches to carry values from one pipe stage to the next

Some Performance Numerics... Unpipeline clock cycle10ns CPI ALU CPI Branch 4,CPI Other 5 F ALU 0.4, F Branch 0.2, F Other 0.4 Pipelined clock cycle11ns Speedup 1050.6 111 4 For a single-cycle impl.t IF 10ns,T ID 8ns T EX 10ns,T MEM 10ns,T WB 7ns Speedup from multi-cycle implementation 45 11 4.09

Pipeline Hazards Structural Hazards: resource conflict Example: same cache/memory for instruction and data Data Hazards: same data item being accessed/written in nearby instructions Example: ADD R1, R2, R3 SUB R4, R1, R5 Control Hazards: branch instructions

Structural Hazards Usually happen when a unit is not fully pipelined That unit cannot churn out one instruction per cycle Or, when a resource has not been duplicated enough Example: same I-cache and D-cache Example: single write-port for register-file Usual solution: stall Also called pipeline bubble, or simply bubble

Stalling the Pipeline CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10 Load IF ID EX MEM WB I+1 IF ID EX MEM WB I+2 IF ID EX MEM WB I+3 STALL IF ID EX MEM WB I+4 IF ID EX MEM WB What is the slowdown due to stalls caused by such load instructions? CPI without stalls1 CPI with stalls1f load Slowdown1F load

Why Allow Structural Hazards? Lower Cost: Lesser hardware ==> lesser cost Shorter latency of unpipelined unit May have other performance benefits Data hazards may introduce stalls anyway! Suppose the FP unit is unpipelined, and the other instructions have a 5-stage pipeline. What percentage of instructions can be FP, so that the CPI does not increase? 20% can be FP, assuming no clustering of FP instructions Even if clustered, data hazards may introduce stalls anyway

Data Hazards Example: ADD R1, R2, R3 SUB R4, R1, R5 AND R6, R1, R7 OR R8, R1, R9 XOR R10,R1, R11 All instructions after ADD depend on R1 Stalling is a possibility Can we do better?

Register File: Reads after Writes ADDR1, R2, R3 SUB R4, R1, R5 ANDR6, R1, R7 OR R8, R1, R9 IM Reg ALU DM

Minimizing Stalls via Forwarding ADDR1, R2, R3 SUB R4, R1, R5 ANDR6, R1, R7 OR R8, R1, R9 IM Reg ALU DM

Data Forwarding for Stores ADDR1, R2, R3 LW R4, 0(R1) SW 12(R1), R4 Note: no data hazards on memory locations in DLX, since memory references are always in order

Data Hazard Classification Read after Write (RAW): use data forwarding to overcome Write after Write (WAW): arises only when writes can happen in different pipeline stages CC1 CC2 CC3 CC4 CC5 CC6 LW R1, 0(R2) IF ID EX MEM1 MEM2 WB ADD R1, R2, R3 IF ID EX WB Has other problems as well: structural hazards Write after Read (WAR): rare CC1 CC2 CC3 CC4 CC5 CC6 SW 0(R1), R2 IF ID EX MEM1 MEM2 WB ADD R2, R3, R4 IF ID EX WB

Stalls due to Data Hazard LW R1, 0(R2) SUB R4, R1, R5 ANDR6, R1, R7 OR R8, R1, R9 IM Reg ALU DM Pipeline interlock is required: to detect hazard and stall

Avoiding such Stalls Compiler scheduling: Example: a = b + c; d = e + f; LW R1, b LW R2, c LW R10, e ADD R4, R1, R2 LW R11, f SW a, R4 ADD R12, R10, R11 SW d, R12 Without such scheduling, what is the slow-down? 1F loads causing stalls

Topics for Next Lecture Control hazards Exceptions during a pipeline More difficult to deal with Cause more damage