CS61C : Machine Structures

Similar documents
Pipeline design. Mehran Rezaei

Designing a Multicycle Processor

CS420/520 Homework Assignment: Pipelining

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

CS61C : Machine Structures

CS 152 Computer Architecture and Engineering. Lecture 10: Designing a Multicycle Processor

CS 61C: Great Ideas in Computer Architecture. MIPS CPU Datapath, Control Introduction

361 datapath.1. Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath

CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath

CPU Design Steps. EECC550 - Shaaban

CS 61C: Great Ideas in Computer Architecture Control and Pipelining

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition. Chapter 4. The Processor

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

Pipelining. CSC Friday, November 6, 2015

Lecture #17: CPU Design II Control

Outline. EEL-4713 Computer Architecture Designing a Single Cycle Datapath

Major CPU Design Steps

Review. N-bit adder-subtractor done using N 1- bit adders with XOR gates on input. Lecture #19 Designing a Single-Cycle CPU

361 control.1. EECS 361 Computer Architecture Lecture 9: Designing Single Cycle Control

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Lecture 7 Pipelining. Peng Liu.

CS61C : Machine Structures

CS3350B Computer Architecture Quiz 3 March 15, 2018

Computer Architecture

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

CS3350B Computer Architecture Winter Lecture 5.7: Single-Cycle CPU: Datapath Control (Part 2)

Pipelined CPUs. Study Chapter 4 of Text. Where are the registers?

What do we have so far? Multi-Cycle Datapath (Textbook Version)

ECE170 Computer Architecture. Single Cycle Control. Review: 3b: Add & Subtract. Review: 3e: Store Operations. Review: 3d: Load Operations

MIPS-Lite Single-Cycle Control

CS61C : Machine Structures

Chapter 4. The Processor

ECE4680 Computer Organization and Architecture. Designing a Pipeline Processor

CS61C : Machine Structures

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

CS 110 Computer Architecture Single-Cycle CPU Datapath & Control

CS152 Computer Architecture and Engineering. Lecture 8 Multicycle Design and Microcode John Lazzaro (

UC Berkeley CS61C : Machine Structures

Very Simple MIPS Implementation

Review: Pipelining. Key to pipelining: smooth flow Making all instructions the same length can increase performance!

ECE468 Computer Organization and Architecture. Designing a Single Cycle Datapath

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

L19 Pipelined CPU I 1. Where are the registers? Study Chapter 6 of Text. Pipelined CPUs. Comp 411 Fall /07/07

Very Simple MIPS Implementation

EEM 486: Computer Architecture. Lecture 3. Designing Single Cycle Control

Adding Support for jal to Single Cycle Datapath (For More Practice Exercise 5.20)

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

Design a MIPS Processor (2/2)

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

CSE 141 Computer Architecture Summer Session Lecture 3 ALU Part 2 Single Cycle CPU Part 1. Pramod V. Argade

CS 61C Summer 2016 Guerrilla Section 4: MIPS CPU (Datapath & Control)

The Big Picture: Where are We Now? EEM 486: Computer Architecture. Lecture 3. Designing a Single Cycle Datapath

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Working on the Pipeline

Recap: A Single Cycle Datapath. CS 152 Computer Architecture and Engineering Lecture 8. Single-Cycle (Con t) Designing a Multicycle Processor

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

Pipelining. Maurizio Palesi

inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 18 CPU Design: The Single-Cycle I ! Nasty new windows vulnerability!

CS 61C Fall 2016 Guerrilla Section 4: MIPS CPU (Datapath & Control)

CS 61C: Great Ideas in Computer Architecture Datapath. Instructors: John Wawrzynek & Vladimir Stojanovic

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

CS 152, Spring 2011 Section 2

UC Berkeley CS61C : Machine Structures

Appendix C. Abdullah Muzahid CS 5513

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!

Processor Design Pipelined Processor (II) Hung-Wei Tseng

Chapter 4 The Processor 1. Chapter 4B. The Processor

COMP303 Computer Architecture Lecture 9. Single Cycle Control

ECS 154B Computer Architecture II Spring 2009

Final Exam Spring 2017

CPU Organization (Design)

UC Berkeley CS61C : Machine Structures

University of California College of Engineering Computer Science Division -EECS. CS 152 Midterm I

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

The Processor: Datapath & Control

Full Datapath. Chapter 4 The Processor 2

CPS104 Computer Organization and Programming Lecture 19: Pipelining. Robert Wagner

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 28: Single- Cycle CPU Datapath Control Part 1

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

CS 61C: Great Ideas in Computer Architecture. Lecture 13: Pipelining. Krste Asanović & Randy Katz

COMPUTER ORGANIZATION AND DESIGN

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

ECE232: Hardware Organization and Design

Single Cycle CPU Design. Mehran Rezaei

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

CS61C : Machine Structures

Ch 5: Designing a Single Cycle Datapath

Pipeline Control Hazards and Instruction Variations

EITF20: Computer Architecture Part2.2.1: Pipeline-1

CPU Pipelining Issues

Modern Computer Architecture

Processor (II) - pipelining. Hwansoo Han

More CPU Pipelining Issues

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Transcription:

inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #19: Pipelining II 2005-07-21 Andy Carle CS 61C L19 Pipelining II (1)

Review: Datapath for MIPS PC instruction memory rd rs rt registers ALU Data memory +4 imm 1. Instruction Fetch 2. Decode/ Register Read Use datapath figure to represent pipeline IFtch Dcd 3. Execute 4. ory 5. Write Back Exec WB ALU I$ Reg D$ Reg CS 61C L19 Pipelining II (2)

Review: Problems for Computers Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle Structural hazards: HW cannot support this combination of instructions (single person to fold and put clothes away) Control hazards: Pipelining of branches & other instructions stall the pipeline until the hazard; bubbles in the pipeline Data hazards: Instruction depends on result of prior instruction still in the pipeline (missing sock) CS 61C L19 Pipelining II (3)

Review: C.f. Branch Delay vs. Load Delay Load Delay occurs only if necessary (dependent instructions). Branch Delay always happens (part of the ISA). Why not have Branch Delay interlocked? Answer: Interlocks only work if you can detect hazard ahead of time. By the time we detect a branch, we already need its value hence no interlock is possible! CS 61C L19 Pipelining II (4)

FYI: Historical Trivia First MIPS design did not interlock and stall on load-use data hazard Real reason for name behind MIPS: Microprocessor without Interlocked Pipeline Stages Word Play on acronym for Millions of Instructions Per Second, also called MIPS Load/Use Wrong Answer! CS 61C L19 Pipelining II (5)

Outline Pipeline Control Forwarding Control Hazard Control CS 61C L19 Pipelining II (6)

Piped Proc So Far CS 61C L19 Pipelining II (7)

New Representation: Regs more explicit IF/DE DE/EX EX/ME ME/WB Next PC PC Inst. IR Reg A B Exec S D Data S M Reg. IF/DE.Ir = Instruction DE/EX.A = BusA out of Reg EX/ME.S = AluOut EX/ME.D = Bus B pass-through for sw ME/WB.S = ALuOut pass-through ME/WB.M = Result from lw CS 61C L19 Pipelining II (8)

New Representation: Regs more explicit IF/DE DE/EX EX/ME ME/WB Next PC PC Inst. IR Reg A B Exec S D Data S M Reg. What s Missing??? CS 61C L19 Pipelining II (9)

Pipelined Processor (almost) for slides Idea: Parallel Piped Control Valid Data Access Exec Reg. A B S PC Reg Inst. IR Dcd Ctrl IRex Ex Ctrl IRmem Ctrl IRwb WB Ctrl Equal M Next PC D CS 61C L19 Pipelining II (10)

Pipelined Control IR <- [PC]; PC < PC+4; A <- R[rs]; B< R[rt] S < A + B; S < A or ZX; S < A + SX; S < A + SX; If Cond PC < PC+SX; M < [S] [S] <- B R[rd] < S; R[rt] < S; R[rd] < M; Equal Next PC PC Inst. IR Reg A B Exec S D Access M Data Reg. CS 61C L19 Pipelining II (11)

Data Stationary Control The Main Control generates the control signals during Reg/Dec Control signals for Exec (ExtOp, ALUSrc,...) are used 1 cycle later Control signals for (Wr Branch) are used 2 cycles later Control signals for Wr (toreg Wr) are used 3 cycles later Reg/Dec Exec Wr ExtOp ExtOp ALUSrc ALUSrc IF/ID Register Main Control ALUOp RegDst W Branch r toreg ID/Ex Register ALUOp RegDst W Branch r toreg Ex/ Register W rbranch toreg /Wr Register toreg RegWr RegWr RegWr RegWr CS 61C L19 Pipelining II (12)

Let s Try it Out 10 lw r1, 36(r2) 14 addi r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 28 ori r8, r9, 17 32 add r10, r11, r12 100 and r13, r14, 15 CS 61C L19 Pipelining II (13)

Start: Fetch 10 n n n n Inst. IR Decode rs rt im Ctrl WB Ctrl Reg = A B Exec S D Access M Data IF Reg. 10 lw r1, 36(r2) 14 addi r2, r2, 3 Next PC 10 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 PC 34 add r10, r11, r12 CS 61C L19 Pipelining II (14) 100 and r13, r14, 15

Fetch 14, Decode 10 Inst. lw r1, 36(r2) IR Decode 2 rt n n n im Ctrl WB Ctrl Reg Next PC = A B 14 Exec S D Access M Data ID IF Reg. 10 lw r1, 36(r2) 14 addi r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 PC 34 add r10, r11, r12 CS 61C L19 Pipelining II (15) 100 and r13, r14, 15

Fetch 20, Decode 14, Exec 10 Inst. addi r2, r2, 3 IR Decode 2 rt lw r1 36 n Ctrl n WB Ctrl Reg Next PC = r2 B 20 Exec S D Access M Data EX ID IF Reg. 10 lw r1, 36(r2) 14 addi r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 PC 34 add r10, r11, r12 CS 61C L19 Pipelining II (16) 100 and r13, r14, 15

Fetch 24, Decode 20, Exec 14, 10 Inst. sub r3, r4, r5 IR Decode 4 5 addi r2, r2, 3 3 lw r1 Ctrl n WB Ctrl Reg Next PC = r2 B 24 Exec r2+36 D Access M Data M EX ID IF Reg. 10 lw r1, 36(r2) 14 addi r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 PC 34 add r10, r11, r12 CS 61C L19 Pipelining II (17) 100 and r13, r14, 15

Fetch 30, Dcd 24, Ex 20, 14, WB 10 Inst. beq r6, r7 100 IR Decode 6 7 Reg Next PC = CS 61C L19 Pipelining II (18) sub r3 r4 r5 30 PC Exec addi r2 r2+3 D Ctrl Access M[r2+36] Data Note Delayed Branch: always execute ori after beq lw r1 WB M EX ID IF WB Ctrl Reg. 10 lw r1, 36(r2) 14 addi r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15

Fetch 100, Dcd 30, Ex 24, 20, WB 14 Inst. ori r8, r9 17 IR Decode 9 xx Reg Next PC = 100 beq r6 r7 100 Exec sub r3 r4-r5 D Ctrl Access addi r2 r2+3 Data WB M EX ID WB Ctrl Reg. r1=m[r2+35] 10 lw r1, 36(r2) 14 addi r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 PC 34 add r10, r11, r12 CS 61C L19 Pipelining II (19) IF 100 and r13, r14, 15

???? Valid Data Access Exec Reg. A B S M Next PC PC Reg Equal Inst. IR Dcd Ctrl IRex Ex Ctrl IRmem Ctrl IRwb WB Ctrl D Remember: means triggered on edge. What is wrong here? CS 61C L19 Pipelining II (20)

Double-Clocked Signals Valid Reg Exec Equal Next PC Inst. IR Dcd Ctrl PC IRex Ex Ctrl IRmem Ctrl IRwb WB Ctrl A B S Access Some signals are double clocked! D M Data Reg. In general: Inputs to edge components are their own pipeline regs Watch out for stalls and such! CS 61C L19 Pipelining II (21)

Administrivia Proj 2 Due Sunday HW6 Due Tuesday Midterm 2: Friday, July 29: 11:00 2:00 Location TBD If you are really so concerned about the drop deadline that this is a problem for you, talk to me about the possibility of taking the exam on Thursday CS 61C L19 Pipelining II (22)

Megahertz Myth? CS 61C L19 Pipelining II (23)

Outline Pipeline Control Forwarding Control Hazard Control CS 61C L19 Pipelining II (24)

Review: Forwarding Fix by Forwarding result as soon as we have it to where we need it: add $t0,$t1,$t2 sub $t4,$t0,$t3 and $t5,$t0,$t6 IF ID/RF EX MEM WB ALU I$ Reg D$ Reg ALU I$ Reg D$ Reg ALU I$ Reg D$ Reg or $t7,$t0,$t8 * xor $t9,$t0,$t10 I$ ALU Reg D$ Reg ALU I$ Reg D$ Reg * or hazard solved by register hardware CS 61C L19 Pipelining II (25)

Forwarding In general: For each stage i that has reg inputs For each stage j after I that has reg output - If i.reg == j.reg forward j value back to i. - Some exceptions ($0, invalid) In particular: ALUinput Input (ALUResult, Result) (Result) CS 61C L19 Pipelining II (26)

Pending Writes In Pipeline Registers IAU npc I mem Regs op rw rs rt PC B A im n alu S n D mem m Regs n CS 61C L19 Pipelining II (27)

Pending Writes In Pipeline Registers IAU Regs B alu S D mem npc I mem op rw rs rt A im n op rw n op rw PC Current operand registers Pending writes hazard <= ((rs == rw ex) & regw ex ) OR ((rs == rw mem) & regw me ) OR ((rs == rw & regw wb) wb ) OR ((rt == rw & regw ex) ex ) OR ((rt == rw & regw mem) me ) OR ((rt == rw wb ) & regw wb ) m n op rw Regs CS 61C L19 Pipelining II (28)

Forwarding Muxes Regs Forward mux B alu S D mem m Regs CS 61C L19 Pipelining II (29) IAU npc I mem A im n op rw n n op rw rs rt op op rw rw PC Detect nearest valid write op operand register and forward into op latches, bypassing remainder of the pipe Increase muxes to add paths from pipeline registers Data Forwarding = Data Bypassing

What about memory operations? Tricky situation: MIPS: lw 0($t0) sw 0($t1) op Rd Ra Rb op Rd Ra Rb A B RTL: R1 <- [ R2 + I ]; [R3+34] <- R1 Rd Rd D R T to reg file CS 61C L19 Pipelining II (30)

What about memory operations? Tricky situation: MIPS: lw 0($t0) sw 0($t1) op Rd Ra Rb op Rd Ra Rb A B RTL: R1 <- [ R2 + I ]; [R3+34] <- R1 Rd D R Solution: Handle with bypass in memory stage! Rd to reg file T CS 61C L19 Pipelining II (31)

Outline Pipeline Control Forwarding Control Hazard Control CS 61C L19 Pipelining II (32)

Data Hazard: Loads (1/4) Forwarding works if value is available (but not written back) before it is needed. But consider lw $t0,0($t1) IF ID/RF EX MEM WB ALU I$ Reg D$ Reg sub $t3,$t0,$t2 ALU I$ Reg D$ Reg Need result before it is calculated! Must stall use (sub) 1 cycle and then forward. CS 61C L19 Pipelining II (33)

Data Hazard: Loads (2/4) Hardware must stall pipeline Called interlock lw $t0, 0($t1) sub $t3,$t0,$t2 and $t5,$t0,$t4 IF ID/RF EX MEM WB ALU I$ Reg D$ Reg bub I$ Reg D$ Reg ble ALU I$ bub Reg D$ Reg ble ALU bub ble or $t7,$t0,$t6 I$ ALU Reg D$ CS 61C L19 Pipelining II (34)

Data Hazard: Loads (3/4) Instruction slot after a load is called load delay slot If that instruction uses the result of the load, then the hardware interlock will stall it for one cycle. If the compiler puts an unrelated instruction in that slot, then no stall Letting the hardware stall the instruction in the delay slot is equivalent to putting a nop in the slot (except the latter uses more code space) CS 61C L19 Pipelining II (35)

Data Hazard: Loads (4/4) Stall is equivalent to nop lw $t0, 0($t1) ALU I$ Reg D$ Reg nop bub ble bub ble bub ble bub ble bub ble sub $t3,$t0,$t2 and $t5,$t0,$t4 ALU I$ Reg D$ Reg ALU I$ Reg D$ Reg or $t7,$t0,$t6 I$ ALU Reg D$ CS 61C L19 Pipelining II (36)

Hazards / Stalling In general: For each stage i that has reg inputs If I s reg is being written later on in the pipe but is not ready yet - Stages 0 to i: Stall (Turn CEs off so no change) - Stage i+1: Make a bubble (do nothing) - Stages i+2 onward: As usual In particular: ALUinput (Result) CS 61C L19 Pipelining II (37)

Hazards / Stalling Alternative Approach: Detect non-forwarding hazards in decode Possible since our hazards are formal. - Not always the case. Stalling then becomes: - Issue nop to EX stage - Turn off nextpc update (refetch same inst) - Turn off InstReg update (re-decode same inst) CS 61C L19 Pipelining II (38)

Stall Logic Regs Forward mux B alu IAU npc I mem op rw rs rt A im n op rw PC 1. Detect nonresolving hazards. 2a. Insert Bubble 2b. Stall nextpc, IF/DE S n op rw D mem m n op rw Regs CS 61C L19 Pipelining II (39)

Stall Logic Stall-on-issue is used quite a bit More complex processors: many cases that stall on issue. More complex processors: cases that can t be detected at decode - E.g. value needed from mem is not in cache proc must stall multiple cycles CS 61C L19 Pipelining II (40)

By the way Notice that our forwarding and stall logic is stateless! Big Idea: Keep it simple! Option 1: Store old fetched inst in reg ( stall_temp ), keep state reg that says whether to use stall_temp or value coming off inst mem. Option 2: Re-fetch old value by turning off PC update. CS 61C L19 Pipelining II (41)