SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Chapter 6 ADMIN. Reading for Chapter 6: 6.1,

Similar documents
Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

Chapter Six. Dataı access. Reg. Instructionı. fetch. Dataı. Reg. access. Dataı. Reg. access. Dataı. Instructionı fetch. 2 ns 2 ns 2 ns 2 ns 2 ns

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...

Pipelining. CSC Friday, November 6, 2015

Basic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Full Datapath. Chapter 4 The Processor 2

1 Hazards COMP2611 Fall 2015 Pipelined Processor

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

Processor (II) - pipelining. Hwansoo Han

COMP2611: Computer Organization. The Pipelined Processor

Chapter 4 (Part II) Sequential Laundry

Outline Marquette University

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

ECE154A Introduction to Computer Architecture. Homework 4 solution

ECE260: Fundamentals of Computer Engineering

Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts

Chapter 4. The Processor

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

COSC 6385 Computer Architecture - Pipelining

LECTURE 3: THE PROCESSOR

Chapter 4 The Processor 1. Chapter 4B. The Processor

Project 2: Pipelining (10%) Purpose. Pipelines. ENEE 646: Digital Computer Design, Fall 2017 Assigned: Wednesday, Sep 6; Due: Tuesday, Oct 3

Full Datapath. Chapter 4 The Processor 2

Lecture 6: Pipelining

14:332:331 Pipelined Datapath

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

CSEE 3827: Fundamentals of Computer Systems

ECE232: Hardware Organization and Design

Designing a Pipelined CPU

Pipeline Data Hazards. Dealing With Data Hazards

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add;

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

COMPUTER ORGANIZATION AND DESIGN

Designing a Pipelined CPU

ECEC 355: Pipelining

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

ECE331: Hardware Organization and Design

Chapter 4. The Processor

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

ECE331: Hardware Organization and Design

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Computer Architecture. Lecture 6.1: Fundamentals of

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

Pipelining: Overview. CPSC 252 Computer Organization Ellen Walker, Hiram College

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Processor Architecture

Thomas Polzer Institut für Technische Informatik

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Pipeline design. Mehran Rezaei

Modern Computer Architecture

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

ECS 154B Computer Architecture II Spring 2009

Chapter 4 The Processor 1. Chapter 4A. The Processor

ELE 655 Microprocessor System Design

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

COMPUTER ORGANIZATION AND DESIGN

CS 2506 Computer Organization II Test 2

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

Pipelined datapath Staging data. CS2504, Spring'2007 Dimitris Nikolopoulos

Pipelining. Maurizio Palesi

Improve performance by increasing instruction throughput

CPE 335 Computer Organization. Basic MIPS Pipelining Part I

CSE 490/590 Computer Architecture Homework 2

ECE 154A Introduction to. Fall 2012

CS 251, Winter 2019, Assignment % of course mark

CS 251, Winter 2018, Assignment % of course mark

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Orange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining

ECE 313 Computer Organization FINAL EXAM December 13, 2000

CSC258: Computer Organization. Microarchitecture

CS 2506 Computer Organization II Test 2. Do not start the test until instructed to do so! printed

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

CS232 Final Exam May 5, 2001

How to design a controller to produce signals to control the datapath

Pipeline Control Hazards and Instruction Variations

What do we have so far? Multi-Cycle Datapath (Textbook Version)

Computer Architecture

CS 2506 Computer Organization II Test 2. Do not start the test until instructed to do so! printed

Instruction Pipelining

Instruction Pipelining

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

zhandling Data Hazards The objectives of this module are to discuss how data hazards are handled in general and also in the MIPS architecture.

COMPUTER ORGANIZATION AND DESI

Short Answer: [3] What is the primary difference between Tomasulo s algorithm and Scoreboarding?

Slides for Lecture 15

Lecture 7 Pipelining. Peng Liu.

Transcription:

SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life Chapter 6 ADMIN ing for Chapter 6: 6., 6.9-6.2 2

Midnight Laundry Task order A 6 PM 7 8 9 0 2 2 AM B C D 3 Smarty Laundry Task order A 6 PM 7 8 9 0 2 2 AM B C D 6 PM 7 8 9 0 2 2 AM Task order A B C D 4

Pipelining Improve performance by increasing instruction throughput Program eecution order (in instructions) lw $, 00($0) lw $2, 200($0) lw $3, 300($0) Program eecution order (in instructions) lw $, 00($0) lw $2, 200($0) 200 ps lw $3, 300($0) 200 400 600 800 000 200 400 600 800 fetch Data access fetch 800 ps fetch 800 ps Data access 200 400 600 800 000 200 400 fetch 200 ps fetch Data access Data access Data access fetch 800 ps 200 ps 200 ps 200 ps 200 ps 200 ps Ideal speedup is number of stages in the pipeline. Do we achieve this? 5 Basic Idea IF: fetch ID: decode/ register file read EX: Eecute/ address calculation MEM: Memory access : back Add 4 Shift left 2 ADD Add result 0 Mu register PC Address Zero register 2 Address isters 0 result Mu Data register 2 Memory memory Mu 0 6 32 Sign etend 6

0 Mu Pipelined Datapath IF/ID ID/EX EX/MEM MEM/ Add 4 Shift left 2 Add Add result PC Address memory register register 2 isters register 2 0 Mu Zero result Address Data memory 0 Mu 6 Sign 32 etend 7 Pipeline Diagrams Clock cycle: 200 400 600 800 000 2 3 4 5 6 7 add $s0, $t0, $t IF ID EX MEM add $s0, $s, $s2 200 400 600 800 000 sub $a, $s2, $a3 add $s0, $t0, $t IF ID EX MEM 200 400 600 800 add $t0, $t, $t2 add $s0, $t0, $t IF ID EX MEM Assumptions: s to memory or register file in 2 nd half of clock cycle s to memory or register file in st half of clock cycle What could go wrong? 8

Problem: Dependencies Problem with starting net instruction before first is finished Clock cycle: 200 2 400 3 600 4 800 5 000 6 7 8 sub add $s0, $s0, $s, $t0, $t $s2 IF ID EX MEM 200 400 600 800 000 and $a, $s0, $a3 add $s0, $t0, $t IF ID EX MEM 200 400 600 800 000 add $t0, $t, $s0 or $t2, $s0, $s0 add $s0, $t0, $t IF ID EX 200 400 MEM 600 800 add $s0, $t0, $t IF ID EX MEM Dependencies that go backward in time are Will the or instruction work properly? 9 Solution: Forwarding Use temporary results, don t wait for them to be written Clock cycle: 200 2 400 3 600 4 800 5 000 6 7 8 sub add $s0, $s0, $s, $t0, $t $s2 IF ID EX MEM 200 400 600 800 000 and $a, $s0, $a3 add $s0, $t0, $t IF ID EX MEM 200 400 600 800 000 add $t0, $t, $s0 or $t2, $s0, $s0 add $s0, $t0, $t IF ID EX 200 400 MEM 600 800 add $s0, $t0, $t IF ID EX MEM Where do we need this? Will this deal with all hazards? 0

Problem? Clock cycle: lw $t0, add 0($s) $s0, $t0, $t IF 200 2 ID 400 3 EX 600 4 MEM 800 5 000 6 7 200 400 600 800 000 sub $a, $t0, $a3 add $s0, $t0, $t IF ID EX MEM 200 400 600 800 0 add $a2, $t0, $t2 add $s0, $t0, $t IF ID EX MEM Forwarding not enough When an instruction tries to a register following a to the same register. Solution: Stall later instruction until result is ready Clock cycle: 2 3 4 5 6 7 lw $t0, 0($s) sub $a, $t0, $a3 add $a2, $t0, $t2 Why does the stall start after ID stage? 2

Assumptions For eercises/eams/everything assume The MIPS 5-stage pipeline That we have forwarding unless told otherwise 3 Eercise # Pipeline diagrams Draw a pipeline stage diagram for the following sequence of instructions. Start at cycle #. You don t need fancy pictures just tet for each stage: ID, MEM, etc. add $s, $s3, $s4 lw $v0, 0($a0) sub $t0, $t, $t2 What is the total number of cycles needed to complete this sequence? What is the doing during cycle #4? When does the sub instruction writeback its result? When does the lw instruction access memory? 4

Eercise #2 Data hazards Consider this code:. add $s, $s3, $s4 2. add $v0, $s, $s3 3. sub $t0, $v0, $t2 4. and $a0, $v0, $s. Draw lines showing all the dependencies in this code 2. Which of these dependencies do not need forwarding to avoid stalling? 5 Eercise #3 Data hazards Draw a pipeline diagram for this code. Show stalls where needed.. add $s, $s3, $s4 2. lw $v0, 0($s) 3. sub $v0, $v0, $s 6

Eercise #4 More Data hazards Draw a pipeline diagram for this code. Show stalls where needed.. lw $s, 0($t0) 2. lw $v0, 0($s) 3. sw $v0, 4($s) 4. sw $t0, 0($t) 7 Eercise #5 Stretch What might be a problem with pipelining the following code? beq $a0, $a, Else lw $v0, 0($s) sw $v0, 4($s) Else: add $a, $a2, $a3 8

Eercise #6 Stretch This diagram (from before) has a serious bug. What is it? IF/ID ID/EX EX/MEM MEM/ Add 4 Shift left 2 Add Add result 0 M u PC Address memory register register 2 isters register 2 0 Mu Zero result Address Data memory 0 Mu 6 Sign 32 etend 9 Big Picture Remember the single-cycle implementation Inefficient because low utilization of hardware resources Each instruction takes one long cycle Two possible ways to improve on this: Multicycle Pipelined Clock cycle time (vs. single cycle) Amount of hardware used (vs. single cycle) Split instruction into multiple stages ( per cycle)? Each stage has its own set of hardware? How many instructions eecuting at once? 20

The Pipeline Parado Pipelining does not the eecution time of any instruction But by instruction eecution, it can greatly improve performance by the 2 Implementing Pipelining What makes it easy? all instructions are the same length just a few instruction formats memory operands appear only in loads and stores What makes it hard? hazards structural hazards control hazards What make it really hard? eception handling Improving performance with out-of-order eecution, etc. 22

Structural Hazards Occur when the hardware can t support the combination of instructions that we want to eecute in the same clock cycle MIPS instruction set designed to reduce this problem But could occur if: 23 Control Hazards What might be a problem with pipelining the following code? beq $a0, $a, Else lw $v0, 0($s) sw $v0, 4($s) Else: add $a, $a2, $a3 What other kinds of instructions would cause this problem? 24

Control Hazard Strategy #: Predict not taken What if we are wrong? Assume branch target and decision known at end of ID cycle. Show a pipeline diagram for when branch is taken. beq $a0, $a, Else lw $v0, 0($s) sw $v0, 4($s) Else: add $a, $a2, $a3 25 Control Hazard Strategies. Predict not taken One cycle penalty when we are wrong not so bad Penalty gets bigger with longer pipelines bigger problem 2. 3. 26

Branch Prediction Taken Predict taken Not taken Taken Predict taken Taken Not taken Predict not taken Not taken Taken Predict not taken Not taken With more sophistication can get 90-95% accuracy Good prediction key to enabling more advanced pipelining techniques! 27 Pipeline Control Generate control signal during the stage control signals along just like the Eecution/Address Calculation stage control lines Memory access stage control lines -back stage control lines Dst Op Op0 Src Branch Mem Mem write Mem to R-format 0 0 0 0 0 0 lw 0 0 0 0 0 sw X 0 0 0 0 0 X beq X 0 0 0 0 0 X Control M EX M IF/ID ID/EX EX/MEM MEM/ 28

Code Scheduling to Improve Performance Can we avoid stalls by rescheduling? lw $t0, 0($t) add $t2, $t0, $t2 lw $t3, 4($t) add $t4, $t3, $t4 Dynamic Pipeline Scheduling Hardware chooses which instructions to eecute net Will eecute instructions out of order (e.g., doesn t wait for a dependency to be resolved, but rather keeps going!) Speculates on branches and keeps the pipeline full (may need to rollback if prediction incorrect) 29 Dynamic Pipeline Scheduling Let hardware choose which instruction to eecute net (might eecute instructions out of program order) Why might hardware do better job than programmer/compiler? Eample # Eample #2 lw $t0, 0($t) add $t2, $t0, $t2 lw $t3, 4($t) add $t4, $t3, $t4 sw $s0, 0($s3) lw $t0, 0($t) add $t2, $t0, $t2 30

Eercise # Can you rewrite this code to eliminate stalls?. lw $s, 0($t0) 2. lw $v0, 0($s) 3. sw $v0, 4($s) 4. add $t0, $t, $t2 3 Eercise #2 Show a pipeline diagram for the following code, assuming: The branch is predicted not taken The branch actually is taken lw $t, 0($t0) beq $s, $s2, Label2 sub $v0, $v, $v2 Label2: add $t0, $t, $t2 32

Eercise #3 True or False?. A pipelined implementation will have a faster clock rate than a comparable single cycle implementation 2. Pipelining increases performance by splitting up each instruction into stages, thereby decreasing the time needed to eecute each instruction. 3. A structural hazard could occur if an instruction produce two results that needed to be written to the register file. 4. Backwards branches are likely to not be taken 33 Eercise #4 Stretch What is problematic about the following code? Show a pipeline diagram assume branch is predicted not taken, but is taken. lw $s, 0($t0) beq $s, $s2, Label2 sub $v0, $v, $v2 Label2: add $t0, $t, $t2 34