ELE 655 Microprocessor System Design

Similar documents
The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Full Datapath. Chapter 4 The Processor 2

Processor (II) - pipelining. Hwansoo Han

Chapter 4. The Processor

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Full Datapath. Chapter 4 The Processor 2

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

LECTURE 3: THE PROCESSOR

COMPUTER ORGANIZATION AND DESIGN

CSEE 3827: Fundamentals of Computer Systems

Chapter 4 The Processor 1. Chapter 4B. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

1 Hazards COMP2611 Fall 2015 Pipelined Processor

Thomas Polzer Institut für Technische Informatik

COMPUTER ORGANIZATION AND DESIGN

Chapter 4 The Processor 1. Chapter 4A. The Processor

Pipelining. CSC Friday, November 6, 2015

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Chapter 4. The Processor

Pipelined datapath Staging data. CS2504, Spring'2007 Dimitris Nikolopoulos

ECE260: Fundamentals of Computer Engineering

zhandling Data Hazards The objectives of this module are to discuss how data hazards are handled in general and also in the MIPS architecture.

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Chapter 4. The Processor

Chapter 4. The Processor

Outline Marquette University

EIE/ENE 334 Microprocessors

LECTURE 9. Pipeline Hazards

ECE473 Computer Architecture and Organization. Pipeline: Data Hazards

14:332:331 Pipelined Datapath

ECE473 Computer Architecture and Organization. Pipeline: Control Hazard

Chapter 4. The Processor

ECE260: Fundamentals of Computer Engineering

Orange Coast College. Business Division. Computer Science Department. CS 116- Computer Architecture. Pipelining

CENG 3420 Lecture 06: Pipeline

ECE232: Hardware Organization and Design

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Pipelined Processor Design

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

DEE 1053 Computer Organization Lecture 6: Pipelining

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Chapter 4. The Processor. Jiang Jiang

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

CS 251, Winter 2019, Assignment % of course mark

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

COSC 6385 Computer Architecture - Pipelining

CS 251, Winter 2018, Assignment % of course mark

COMPUTER ORGANIZATION AND DESI

Suggested Readings! Recap: Pipelining improves throughput! Processor comparison! Lecture 17" Short Pipelining Review! ! Readings!

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

Processor (IV) - advanced ILP. Hwansoo Han

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

CS 2506 Computer Organization II Test 2. Do not start the test until instructed to do so! printed

ECS 154B Computer Architecture II Spring 2009

Pipeline Review. Review

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

Unresolved data hazards. CS2504, Spring'2007 Dimitris Nikolopoulos

Pipelining is Hazardous!

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

CS2100 Computer Organisation Tutorial #10: Pipelining Answers to Selected Questions

Computer Architecture

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

ECE154A Introduction to Computer Architecture. Homework 4 solution

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add;

Instr. execution impl. view

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

CS 61C: Great Ideas in Computer Architecture. Lecture 13: Pipelining. Krste Asanović & Randy Katz

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Instruction Pipelining

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Advanced Instruction-Level Parallelism

Pipelined Processor Design

Chapter 6 Exercises with solutions

Lecture 2: Processor and Pipelining 1

MIPS An ISA for Pipelining

The Processor: Instruction-Level Parallelism

Basic Pipelining Concepts

Basic Instruction Timings. Pipelining 1. How long would it take to execute the following sequence of instructions?

(Basic) Processor Pipeline

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

Computer Organization and Structure

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

Lecture 9 Pipeline and Cache

The Processor: Improving the performance - Control Hazards

Chapter 3. Pipelining. EE511 In-Cheol Park, KAIST

EE 4980 Modern Electronic Systems. Processor Advanced

Pipeline Data Hazards. Dealing With Data Hazards

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Computer Architecture Spring 2016

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Static, multiple-issue (superscaler) pipelines

Transcription:

ELE 655 Microprocessor System Design Section 2 Instruction Level Parallelism Class 1

Basic Pipeline Notes: Reg shows up two places but actually is the same register file Writes occur on the second half of the clock cycle, reads on the first half 2 tj

Basic Pipeline Implementation Notes: Writes occur on the second half of the clock cycle, reads on the first half 3 tj

Basic Pipeline with Control 4 tj

Pipeline Hazards Hazards are conditions where the next instruction cannot perform its assigned pipeline action in the next clock cycle 3 types Structural Data Control 5 tj

Structural Hazards These hazards result from a resource conflict Classic case is Harvard vs. vonneuman memory architectures vonneuman architectures share a single memory for program and data A lw or sw command requires access to data memory to load or store the data value It would not be possible to fetch the appropriate instruction during this clock cycle since the memory would be in use The IF would be stalled and a bubble would be created in the pipeline 6 tj

Structural Hazards vonneuman memory architecture Time 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 IF LW 2 3 Stall 4 5 6 7 8 9 10 11 12 13 14 ID LW 2 3 bubble 4 5 6 7 8 9 10 11 12 13 EX LW 2 3 bubble 4 5 6 7 8 9 10 11 12 MEM LW 2 3 bubble 4 5 6 7 8 9 10 11 WB data memory access prevents a concurrent LW instruction 2 fetch 3 bubble 4 5 6 7 8 9 10 7 tj

Structural Hazards Too few registers Complex instructions multi resource requirements (dual write) Non-pipelined functions Mismatched pipelines 8 tj

Data Hazards These hazards result from a dependence of one instruction on another instruction still in the pipeline Consider the following code snippit add $s0, $t0, $t1 sub $t2, $s0, $t3 The value of $s0 is needed to perform the subtraction 9 tj

Data Hazards add $s0, $t0, $t1 sub $t2, $s0, $t3 Time 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 IF add sub 3 3 4 5 6 7 8 9 10 11 12 13 14 ID add stall stall 3 4 5 6 7 8 9 10 11 12 13 EX add bubble bubble 3 4 5 6 7 8 9 10 11 12 MEM add bubble bubble 3 4 5 6 7 8 9 10 11 WB add bubble bubble 3 4 5 6 7 8 9 10 2 clock cycle bubbles are created It would be 3 bubbles except we can take advantage of our convention writes occur in the first half of the clock cycle reads occur in the second half of the clock cycle the WB occurs during the same clock cycle as the register read 10 tj

Data Hazards add $s0, $t0, $t1 sub $t2, $s0, $t3 2 clock cycle bubbles are created It would be 3 bubbles except we can take advantage of our convention writes occur in the first half of the clock cycle reads occur in the second half of the clock cycle the WB occurs during the same clock cycle as the register read 11 tj

Data Hazards In many cases the compiler can avoid a data hazard add $s0, $t0, $t1 sub $t2, $s0, $t3 or $s2, $t0, $t1 and $s3, $t0, $t3 add $s4, $t1, $t3 add $s0, $t0, $t1 or $s2, $t0, $t1 and $s3, $t0, $t3 add $s4, $t1, $t3 sub $t2, $s0, $t3 re-order the instruction to remove the hazard condition 12 tj

Data Hazards Hardware can also be used to avoid data hazards called forwarding or bypassing provide the needed data as soon as it is valid requires extra circuitry 13 tj

Data Hazards 14 tj

Detecting the need for Forwarding Conditions: 1a) EX/MEM.RegisterRd = ID/EX.RegisterRs EX/MEM currently holds a value needed by an instruction about to enter EX 1b) ) EX/MEM.RegisterRd = ID/EX.RegisterRt EX/MEM currently holds a value needed by an instruction about to enter EX 2a) MEM/WB.RegisterRd = ID/EX.RegisterRs MEM/WB currently holds a value needed by an instruction about to enter EX 2b) MEM/WB.RegisterRd = ID/EX.RegisterRt MEM/WB currently holds a value needed by an instruction about to enter EX 15 tj

Pipeline with Forwarding 16 tj

Data Hazards Hardware cannot avoid all data hazards cannot go backwards in time! lw $s0, 20($t1) sub $t2, $s0, $t3 17 tj

Data Hazards Forwarding plus compiler optimizations can avoid additional data hazards stall stall lw $t1, 0($t0) lw $t2, 4($t0) add $t3, $t1, $t2 sw $t3, 12($t0) lw $t4, 8($t0) add $t5, $t1, $t4 sw $t5, 16($t0) 13 cycles lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1, $t2 sw $t3, 12($t0) add $t5, $t1, $t4 sw $t5, 16($t0) 11 cycles 18 tj

Stalling the pipeline and inserting a bubble 19 tj

Branch Hazard Consider the following code snippit beq $1, $3, skip and $13, $6, $2 add $14, $2, $2 skip lw $4, 50($7) The branch decision is known after the calculation of the branch address and the comparison (subtract and check for zero), and is available in the MEM stage If the branch is ignored we will have the and, add and lw instructions in the pipeline all is well If the branch is taken we will have the and, add and lw instructions in the pipeline but we do not want them to execute 20 tj

Addition of Branch Logic 21 tj

Branch Hazard In our current implementation We assume branches are not taken We would need to flush the pipeline for taken branches Branch decision is available in the MEM stage Assuming the branch target is not already in the pipeline inserting 3 bubbles into the pipeline? How far can we move the decision forward to reduce the impact of taken branches 22 tj

Branch Hazard Most branches are simple comparisons equal all bits the same negative/positive look at msb We can move most of the branch prediction logic forward to the ID stage We have register values (some may be forwarded!) Need to modify the forwarding logic to account for branches We can move the branch address calculation forward to the ID stage Still have a single cycle stall IF of the next instruction is occurring in parallel with ID detection of the taken branch 23 tj

Early Branch Detection 24 tj

Branch Hazard This approach reduces the impact of branch hazards There may still be data hazards that cannot be avoided Branch dependent on previous lw instruction lw $1, 50($7) beq $1, $4, skip the lw result will not be available until the MEM cycle is complete 2 stall cycles 25 tj

Branch Prediction For deeper pipelines the cost of missing a branch decision can be significant many clock cycles lost Leads to branch prediction Static Compile Time Dynamic Execution time 26 tj

Static Branch Prediction Freeze the pipeline until decision known Do not allow additional instructions until branch decision known Insert No-op instruction(s) Assume NOT taken Allow sequential instruction into pipeline HW must prevent commits until decision known Must have HW to nop currently executing instructions This is our simple pipeline solution Compiler can help by ordering instructions to match the HW Assume Taken As soon as target address available start reading instructions Used when complicated conditional instructions are part of the IS 27 tj

Static Branch Prediction Delayed Branch Branch instruction Sequential successor Branch target if taken (always executed instruction) - delay slot 28 tj

Static Branch Prediction Delayed Branch Branch instruction Sequential successor Branch target if taken Must be OK whether branch taken or not 29 tj

Static Branch Prediction Better than nothing but not good enough for complex pipelines 30 tj

Dynamic Branch Prediction 1 bit Use a small branch prediction buffer n words deep 1 bit of prediction value (1 bit word) n is derived from the PC value last 8 bits of PC 256 words deep The PC value references one of n predictions values Assuming a branch instruction Take the branch if the prediction value is set to 1 Don t take the branch if the prediction value is set to 0 If the prediction was wrong invert the prediction value 31 tj

Dynamic Branch Prediction 1 bit PC Branch Table ADDR B 0x00000000 0 0x00000001 0 0x00000010 1 0x11111110 1 0x11111111 1 32 tj

Dynamic Branch Prediction 1 bit Issues Multiple PC values point to the same branch table location over write each other wrong guesses Each incorrect guess can lead to 2 wrong guesses eg. Assume mostly loop back bit set to 1 when you do not loop back you stall and set bit to 0 next cycle you want to loop back but bit is 0 stall and set bit to 1 2 stalls 33 tj

Control Hazards Dynamic Branch Prediction 2 bit Use 2 bits to make prediction decisions Only change the prediction on 2 successive mispredictions Resolves the 2 stall issue of 1 bit prediction PC Branch Table ADDR B1 B0 0x00000000 0 0 0x00000001 0 0 0x00000010 1 0 Option: Include bits in cache block instead of separate location issue? 0x11111110 0 1 0x11111111 1 1 34 tj

Dynamic Branch Prediction 2 bit 35 tj

Dynamic Branch Prediction 2 bit 36 tj