COMP2611: Computer Organization. The Pipelined Processor

Similar documents
The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

CPE 335 Computer Organization. Basic MIPS Pipelining Part I

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

Processor (II) - pipelining. Hwansoo Han

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

Full Datapath. Chapter 4 The Processor 2

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Pipelining. CSC Friday, November 6, 2015

ECE260: Fundamentals of Computer Engineering

Perfect Student CS 343 Final Exam May 19, 2011 Student ID: 9999 Exam ID: 9636 Instructions Use pencil, if you have one. For multiple choice

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

ECS 154B Computer Architecture II Spring 2009

ECEC 355: Pipelining

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Pipelined datapath Staging data. CS2504, Spring'2007 Dimitris Nikolopoulos

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

ECE154A Introduction to Computer Architecture. Homework 4 solution

Processor (I) - datapath & control. Hwansoo Han

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

COSC 6385 Computer Architecture - Pipelining

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

CSEN 601: Computer System Architecture Summer 2014

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

Lecture 7 Pipelining. Peng Liu.

LECTURE 3: THE PROCESSOR

Computer Architecture. Lecture 6.1: Fundamentals of

COMPUTER ORGANIZATION AND DESIGN

Chapter 4 The Processor 1. Chapter 4B. The Processor

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

Codeword[1] Codeword[0]

COMPUTER ORGANIZATION AND DESIGN

Pipelined Processor Design

Basic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions?

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

Chapter 4. The Processor

CSE Quiz 3 - Fall 2009

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

Lecture 6: Pipelining

Computer Organization and Structure

Design of the MIPS Processor (contd)

CPE 335. Basic MIPS Architecture Part II

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

COMPUTER ORGANIZATION AND DESIGN

SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Chapter 6 ADMIN. Reading for Chapter 6: 6.1,

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

CS3350B Computer Architecture Quiz 3 March 15, 2018

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization

Full Datapath. Chapter 4 The Processor 2

LECTURE 5. Single-Cycle Datapath and Control

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Pipelined Processor Design

CSE 2021 COMPUTER ORGANIZATION

Design of the MIPS Processor

Processor Architecture

CSE 2021 COMPUTER ORGANIZATION

Beyond Pipelining. CP-226: Computer Architecture. Lecture 23 (19 April 2013) CADSL

Working on the Pipeline

EE557--FALL 1999 MIDTERM 1. Closed books, closed notes

CSEE 3827: Fundamentals of Computer Systems

ECE260: Fundamentals of Computer Engineering

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

ECE260: Fundamentals of Computer Engineering

What do we have so far? Multi-Cycle Datapath (Textbook Version)

Pipeline design. Mehran Rezaei

Pipelining. lecture 15. MIPS data path and control 3. Five stages of a MIPS (CPU) instruction. - factory assembly line (Henry Ford years ago)

Lecture 8: Control COS / ELE 375. Computer Architecture and Organization. Princeton University Fall Prof. David August

Designing a Pipelined CPU

Lecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón

EE 457 Unit 6a. Basic Pipelining Techniques

DEE 1053 Computer Organization Lecture 6: Pipelining

Pipelining. Pipeline performance

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

Outline Marquette University

Systems Architecture

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

RISC Pipeline. Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University. See: P&H Chapter 4.6

CS 2506 Computer Organization II Test 2

Computer Architecture

Single Cycle Data Path

Computer Systems Architecture Spring 2016

Transcription:

COMP2611: Computer Organization The 1

2 Background 2

High-Performance Processors 3 Two techniques for designing high-performance processors by exploiting parallelism: Multiprocessing: parallelism among multiple processors Pipelining: implements parallelism between instructions on the same processor In serial T i m e T a s k o r d e r A B C D 6 P M 7 8 9 10 11 12 1 2 A M T i m e 6 P M 7 8 9 10 11 12 1 2 A M T a s k o r d e r Pipeline A B C D

Pipelining Principles 4 Key characteristics: Multiple tasks are processed simultaneously Ideally, these tasks should be independent of each other otherwise we need to make this the case Pipelining does not help the latency of a single task It helps the throughput of the entire workload Completion order in pipelined execution = that in sequential execution How much can a pipeline improve? Potential speedup = number of pipeline stages The pipeline rate is limited by the slowest pipeline stage Unbalanced lengths of pipeline stages can reduce speedup. Why?

Example: Bottleneck in Pipelining 5 4 4 Can I align the pipeline stages as above? Answer: NO, because the tasks executing in parallel are not independent (task 4 overlaps task 4) The condition to align is to make sure NO OVERLAP of any stages? 4 4 4

6 The MIPS Pipeline 6

Pipeline Performance Example 7 Assume we require: - 100 picoseconds for register read or write - 200 picoseconds for all other stages Instruction Instruction fetch Register read ALU op Memory access Register write Total time Load Word (lw) Store Word (sw) 200ps 100 ps 200ps 200ps 100 ps 800ps 200ps 100 ps 200ps 200ps 700ps R-format 200ps 100 ps 200ps 100 ps 600ps Branch (beq) 200ps 100 ps 200ps 500ps

Pipelined vs. Non-pipelined 8 Single-cycle, non-pipelined execution: Total execution time = 2400 ps Pipelined execution Total execution time =1400 ps

The Performance Benefit 9 The instruction delays in the example: 800 ps (single cycle datapath) 1000 ps (pipelined datapath) The instruction throughputs in the example: 1 instruction per 800 ps (single cycle datapath) 1 instruction per 200 ps (long time average for pipelined datapath) Pipelining does not improve the latency of a single instruction, it improves the throughput of the system (i.e., the datapath) In general (ideally), if we have a N stage pipeline: We need N-1 cycles to fill the pipeline, Then one instruction will finish per cycle So the throughput is: Clock Rate x IC/(IC + N - 1), with IC >> N

The Performance Benefit 10 Pipeline speed is limited by the most time consuming pipeline stage, as this stage determines the duration of a clock cycle (why?) A well balanced pipeline is one such that each stage takes the same amount of time to execute. When the pipeline is well balanced : Timebetweeninstructions pipelined = Timebetweeninstructions nonpipelined Number of stages If the pipeline is well balanced, the speedup equals to Number of pipeline stages (a.k.a depth of the pipeline).

MIPS Pipeline 11 Basic idea: take a single-cycle datapath and separate it into 5 pieces. Each piece responsible for a single instruction execution stage.

MIPS ISA for Pipelining 12 ISA design affects the complexity of pipeline implementation. MIPS ISA is designed for pipelining All instruction are of the same length (32-bit) Easy to fetch one instruction in first stage of the pipeline and decode it in the second It has just a few similar instruction formats With the source register fields being located in the same place in all instructions, 2nd stage can read the register file while decoding the type of instruction just fetched Memory operands only appear in loads and stores We can use the execute stage to calculate the memory address and then access memory in the following stage Alignment of memory operands on word boundaries We need not worry about a single data transfer instruction requiring two memory accesses; the data can be transferred between processor and memory in a single pipeline stage

Pipelining Instructions 13 In pipelined processor, Each instruction takes multiple steps Each step is independent of each other and takes different datapath instr #1 processor cycle IF ID EXE MEM WB the entire processor s datapath and modules can be fully utilized instr #2 instr #3 instr #4 instr #5 instr #6 IF ID EXE MEM WB IF ID EXE MEM WB IF ID EXE MEM WB IF ID EXE MEM WB IF ID EXE MEM WB At each cycle, one instruction is fetched and sent to the processor Ideally, after pipeline is fully filled, one instruction completes each cycle

MIPS Pipeline Stages 14 Execution of each instruction is broken into 5 stages: (in the order of execution) IF : Fetch the instruction from memory ID : Instruction decode & register read EX : Perform ALU operation MEM : Memory access (if necessary) WB : Write result back to register Each stage uses a different hardware unit and takes one clock cycle to complete. Instructions can co-exist in the datapath if all of them are in different stages of execution from one another

Pipeline Registers 15 Additional pipeline registers are needed Located between the stages, i.e. IF/ID, ID/EX, EX/MEM, MEM/WB) Hold information produced in the previous cycle Pipeline registers

Pipeline Diagram 16 Every clock cycle, many instructions are simultaneously executing in a single datapath Two common ways of showing the pipeline operations Single-clock-cycle pipeline diagram : shows the pipeline usage in a single cycle, highlight the resources used. An example of lw execution is shown in following pages Multi-clock-cycle pipeline diagram : shows the graph of operation over time.

Single clock cycle diagram: IF stage of lw 17

Single clock cycle diagram: ID stage of lw 18

Single clock cycle diagram: EXE stage of lw 19

Single clock cycle diagram: MEM stage of lw 20

Single clock cycle diagram: WB stage of lw 21

Single clock cycle diagram: WB stage of lw 22 There is a problem with the WB stage of lw! Instruction supplying the write register number is NOT lw!

The corrected Datapath for lw 23 To solve this problem: the write register information is forwarded from the MEM/WB pipeline registers.

Multi-clock-cycle pipeline diagram : traditional view 24 The following diagram shows the execution of a series of instructions.

Multi-clock-cycle pipeline diagram: graphical view 25 The multi-clock-cycle form showing the hardware utilizations.

Single-clock-cycle diagram in CC5 26

The Pipeline operation 27 Ideally One stage begins in every cycle. One stage completes in each cycle. Each instruction takes 5 cycles In each clock cycle, several instructions are active. Different stages are executing different instructions. Difficulty: How to generate the control signals? we need to set the control signals for each pipeline stage for each instruction.

28 Pipelined control 28

Pipelined Control? 29 Let s start with a simple design that views the problem in a greatly simplified way. Use the same controls as the single-cycle datapath, but pipeline them so that the correct control signals are supplied for each stage of the instruction. Pass control signals together with the instruction through the pipeline. Temporarily ignore data dependence related problems (Hazards), and will provide solutions to this problem later.

Control Signals Identified 30

Control Signals For Each Stage 31 The control signals required by each stage are grouped Stage 1: Instruction fetch (IF) no control signals, the instruction is read from the instruction memory and PC is updated to PC+4. Stage 2: Instruction decode and register read (ID) no control signals, instruction is decoded and source operands are read from register file. Stage 3: Execute (EX) RegDst, ALUOp, and ALUSrc. Stage 4: Memory Access (MEM) Branch, MemRead, and MemWrite Stage 5: Write Back (WB) MemToReg and RegWrite

Control Signals For Each Stage 32 The group of control signals and their values for different classes of instructions: Instructions EX MEM WB RegDst ALUOp1 ALUOp2 ALUSrc Branch MemRead MemWrite RegWrite MemToReg R-format 1 1 0 0 0 0 0 1 0 lw 0 0 0 1 0 1 0 1 1 sw X 0 0 1 0 0 1 0 X beq X 0 1 0 1 0 0 0 X

The Pipelined Control: passing the controls 33 Control signals are passed to the next stage only if they are required RegWrite MemToReg Branch MemRead MemWrite RegDst ALUOp1 ALUOp2 ALUSrc

The Pipelined Control: the complete datapath 34