Lecture: Pipelining Basics

Similar documents
Lecture 5: Pipelining Basics

Lecture 2: Pipelining Basics. Today: chapter 1 wrap-up, basic pipelining implementation (Sections A.1 - A.4)

Lecture: Benchmarks, Pipelining Intro. Topics: Performance equations wrap-up, Intro to pipelining

Lecture: Pipeline Wrap-Up and Static ILP

What is Pipelining? RISC remainder (our assumptions)

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

COSC 6385 Computer Architecture - Pipelining

Lecture 10: Static ILP Basics. Topics: loop unrolling, static branch prediction, VLIW (Sections )

Lecture 1 An Overview of High-Performance Computer Architecture. Automobile Factory (note: non-animated version)

Unpipelined Machine. Pipelining the Idea. Pipelining Overview. Pipelined Machine. MIPS Unpipelined. Similar to assembly line in a factory

CS 352H Computer Systems Architecture Exam #1 - Prof. Keckler October 11, 2007

Pipeline Architecture RISC

Pipelining: Hazards Ver. Jan 14, 2014

Computer Architecture CS372 Exam 3

5008: Computer Architecture HW#2

DLX Unpipelined Implementation

Lecture 8: Instruction Fetch, ILP Limits. Today: advanced branch prediction, limits of ILP (Sections , )

Lecture: Static ILP. Topics: compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2)

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Pipelining. Pipeline performance

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr

ECE 341. Lecture # 15

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

CIS 662: Midterm. 16 cycles, 6 stalls

CSE A215 Assembly Language Programming for Engineers

Lecture: Static ILP. Topics: loop unrolling, software pipelines (Sections C.5, 3.2)

Lecture 6 MIPS R4000 and Instruction Level Parallelism. Computer Architectures S

What is Pipelining. work is done at each stage. The work is not finished until it has passed through all stages.

Lecture 9: Dynamic ILP. Topics: out-of-order processors (Sections )

Readings. H+P Appendix A, Chapter 2.3 This will be partly review for those who took ECE 152

EITF20: Computer Architecture Part2.2.1: Pipeline-1

MIPS ISA AND PIPELINING OVERVIEW Appendix A and C

EE 457 Unit 6a. Basic Pipelining Techniques

ECSE 425 Lecture 6: Pipelining

CS252 Prerequisite Quiz. Solutions Fall 2007

CS Mid-Term Examination - Fall Solutions. Section A.

Pipeline Review. Review

This course provides an overview of the SH-2 32-bit RISC CPU core used in the popular SH-2 series microcontrollers

Lecture 9: Case Study MIPS R4000 and Introduction to Advanced Pipelining Professor Randy H. Katz Computer Science 252 Spring 1996

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

EE 4683/5683: COMPUTER ARCHITECTURE

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Improving Performance: Pipelining

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

Practice Assignment 1

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )

Lecture 19 Introduction to Pipelining

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines

Week 11: Assignment Solutions

CMSC 411 Practice Exam 1 w/answers. 1. CPU performance Suppose we have the following instruction mix and clock cycles per instruction.

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!

Lecture 11: Out-of-order Processors. Topics: more ooo design details, timing, load-store queue

Spring 2014 Midterm Exam Review

Computer Architecture

Lecture 19. Software Pipelining. I. Example of DoAll Loops. I. Introduction. II. Problem Formulation. III. Algorithm.

Lecture: Out-of-order Processors. Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Pipelining: Basic Concepts

EITF20: Computer Architecture Part2.2.1: Pipeline-1

L19 Pipelined CPU I 1. Where are the registers? Study Chapter 6 of Text. Pipelined CPUs. Comp 411 Fall /07/07

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

Advanced Computer Architecture

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Introduction to Pipelining. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T.

ECE/CS 552: Introduction to Computer Architecture ASSIGNMENT #1 Due Date: At the beginning of lecture, September 22 nd, 2010

Pipelining. Maurizio Palesi

Lecture 4: Instruction Set Design/Pipelining

Chapter 4. The Processor

Pipelined CPUs. Study Chapter 4 of Text. Where are the registers?

Exploiting ILP with SW Approaches. Aleksandar Milenković, Electrical and Computer Engineering University of Alabama in Huntsville

T T T T T T N T T T T T T T T N T T T T T T T T T N T T T T T T T T T T T N.

Lecture 16: Core Design. Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue

LECTURE 10. Pipelining: Advanced ILP

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Background: Pipelining Basics. Instruction Scheduling. Pipelining Details. Idealized Instruction Data-Path. Last week Register allocation

Instruction Pipelining Review

Chapter 4. The Processor

Lecture 15: Pipelining. Spring 2018 Jason Tang

C.1 Introduction. What Is Pipelining? C-2 Appendix C Pipelining: Basic and Intermediate Concepts

Pipelined Processors. Ideal Pipelining. Example: FP Multiplier. 55:132/22C:160 Spring Jon Kuhl 1

CISC 662 Graduate Computer Architecture Lecture 5 - Pipeline. Pipelining. Pipelining the Idea. Similar to assembly line in a factory:

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add;

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

Floating Point/Multicycle Pipelining in DLX

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Performance. CS 3410 Computer System Organization & Programming. [K. Bala, A. Bracy, E. Sirer, and H. Weatherspoon]

(Basic) Processor Pipeline

ECE 486/586. Computer Architecture. Lecture # 12

COSC 6385 Computer Architecture - Pipelining (II)

Course web site: teaching/courses/car. Piazza discussion forum:

CSCE 212: FINAL EXAM Spring 2009

Structure of Computer Systems

Slides for Lecture 15

Designing for Performance. Patrick Happ Raul Feitosa

Transcription:

Lecture: Pipelining Basics Topics: Basic pipelining implementation Video 1: What is pipelining? Video 2: Clocks and latches Video 3: An example 5-stage pipeline Video 4: Loads/Stores and RISC/CISC Video 5: Hazards Video 6: Examples of Hazards 1

Building a Car Unpipelined Start and finish a job before moving to the next Jobs Time 2

The Assembly Line Pipelined Break the job into smaller stages A B C A B C Jobs A B C A B C Time 3

Clocks and Latches Stage 1 Stage 2 4

Clocks and Latches Stage 1 L Stage 2 L Clk 5

Some Equations Unpipelined: time to execute one instruction = T + Tovh For an N-stage pipeline, time per stage = T/N + Tovh Total time per instruction = N (T/N + Tovh) = T + N Tovh Clock cycle time = T/N + Tovh Clock speed = 1 / (T/N + Tovh) Ideal speedup = (T + Tovh) / (T/N + Tovh) Cycles to complete one instruction = N Average CPI (cycles per instr) = 1 6

Problem 1 An unpipelined processor takes 5 ns to work on one instruction. It then takes 0.2 ns to latch its results into latches. I was able to convert the circuits into 5 equal sequential pipeline stages. Answer the following, assuming that there are no stalls in the pipeline. What are the cycle times in the two processors? What are the clock speeds? What are the IPCs? How long does it take to finish one instr? What is the speedup from pipelining? 7

Problem 1 An unpipelined processor takes 5 ns to work on one instruction. It then takes 0.2 ns to latch its results into latches. I was able to convert the circuits into 5 equal sequential pipeline stages. Answer the following, assuming that there are no stalls in the pipeline. What are the cycle times in the two processors? 5.2ns and 1.2ns What are the clock speeds? 192 MHz and 833 MHz What are the IPCs? 1 and 1 How long does it take to finish one instr? 5.2ns and 6ns What is the speedup from pipelining? 833/192 = 4.34 8

Problem 2 An unpipelined processor takes 5 ns to work on one instruction. It then takes 0.2 ns to latch its results into latches. I was able to convert the circuits into 5 sequential pipeline stages. The stages have the following lengths: 1ns; 0.6ns; 1.2ns; 1.4ns; 0.8ns. Answer the following, assuming that there are no stalls in the pipeline. What is the cycle time in the new processor? What is the clock speed? What is the IPC? How long does it take to finish one instr? What is the speedup from pipelining? What is the max speedup from pipelining? 9

Problem 2 An unpipelined processor takes 5 ns to work on one instruction. It then takes 0.2 ns to latch its results into latches. I was able to convert the circuits into 5 sequential pipeline stages. The stages have the following lengths: 1ns; 0.6ns; 1.2ns; 1.4ns; 0.8ns. Answer the following, assuming that there are no stalls in the pipeline. What is the cycle time in the new processor? 1.6ns What is the clock speed? 625 MHz What is the IPC? 1 How long does it take to finish one instr? 8ns What is the speedup from pipelining? 625/192 = 3.26 What is the max speedup from pipelining? 5.2/0.2 = 26 10

A 5-Stage Pipeline 11 Source: H&P textbook

A 5-Stage Pipeline Use the PC to access the I-cache and increment PC by 4 12

A 5-Stage Pipeline Read registers, compare registers, compute branch target; for now, assume branches take 2 cyc (there is enough work that branches can easily take more) 13

A 5-Stage Pipeline ALU computation, effective address computation for load/store 14

A 5-Stage Pipeline Memory access to/from data cache, stores finish in 4 cycles 15

A 5-Stage Pipeline Write result of ALU computation or load into register file 16

RISC/CISC Loads/Stores 17

Problem 3 Convert this C code into equivalent RISC assembly instructions a[i] = b[i] + c[i]; 18

Problem 3 Convert this C code into equivalent RISC assembly instructions a[i] = b[i] + c[i]; LD [R1], R2 # R1 has the address for variable i MUL R2, 8, R3 # the offset from the start of the array ADD R4, R3, R7 # R4 has the address of a[0] ADD R5, R3, R8 # R5 has the address of b[0] ADD R6, R3, R9 # R6 has the address of c[0] LD [R8], R10 # Bringing b[i] LD [R9], R11 # Bringing c[i] ADD R10, R11, R12 # Sum is in R12 ST [R7], R12 # Putting result in a[i] 19

Problem 4 Design your own hypothetical 8-stage pipeline. 20

Title Bullet 21