Working on the Pipeline

Similar documents
CS 61C: Great Ideas in Computer Architecture Control and Pipelining

CS3350B Computer Architecture Winter Lecture 5.7: Single-Cycle CPU: Datapath Control (Part 2)

CS 110 Computer Architecture Single-Cycle CPU Datapath & Control

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Single- Cycle CPU Datapath & Control Part 2

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

Lecture #17: CPU Design II Control

UC Berkeley CS61C : Machine Structures

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

COMP303 Computer Architecture Lecture 9. Single Cycle Control

CS 61C: Great Ideas in Computer Architecture. MIPS CPU Datapath, Control Introduction

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

UC Berkeley CS61C : Machine Structures

MIPS-Lite Single-Cycle Control

The Big Picture: Where are We Now? EEM 486: Computer Architecture. Lecture 3. Designing a Single Cycle Datapath

361 datapath.1. Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath

361 control.1. EECS 361 Computer Architecture Lecture 9: Designing Single Cycle Control

CS61C : Machine Structures

CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath

The Processor: Datapath & Control

CS 61C: Great Ideas in Computer Architecture Datapath. Instructors: John Wawrzynek & Vladimir Stojanovic

CS 61C: Great Ideas in Computer Architecture Lecture 12: Single- Cycle CPU, Datapath & Control Part 2

CS3350B Computer Architecture Quiz 3 March 15, 2018

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

CPU Design Steps. EECC550 - Shaaban

CPU Organization (Design)

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 28: Single- Cycle CPU Datapath Control Part 1

Single Cycle CPU Design. Mehran Rezaei

ECE468 Computer Organization and Architecture. Designing a Single Cycle Datapath

EEM 486: Computer Architecture. Lecture 3. Designing Single Cycle Control

Major CPU Design Steps

CS61C : Machine Structures

CS359: Computer Architecture. The Processor (A) Yanyan Shen Department of Computer Science and Engineering

CS61C : Machine Structures

University of California College of Engineering Computer Science Division -EECS. CS 152 Midterm I

ECE170 Computer Architecture. Single Cycle Control. Review: 3b: Add & Subtract. Review: 3e: Store Operations. Review: 3d: Load Operations

Outline. EEL-4713 Computer Architecture Designing a Single Cycle Datapath

Lecture 6 Datapath and Controller

Computer Architecture. Lecture 6.1: Fundamentals of

Designing a Multicycle Processor

UC Berkeley CS61C : Machine Structures

Midterm I March 3, 1999 CS152 Computer Architecture and Engineering

inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 19 CPU Design: The Single-Cycle II & Control !

Instructor: Randy H. Katz hcp://inst.eecs.berkeley.edu/~cs61c/fa13. Fall Lecture #18. Warehouse Scale Computer

Review. N-bit adder-subtractor done using N 1- bit adders with XOR gates on input. Lecture #19 Designing a Single-Cycle CPU

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

How to design a controller to produce signals to control the datapath

Chapter 4. The Processor. Computer Architecture and IC Design Lab

inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 18 CPU Design: The Single-Cycle I ! Nasty new windows vulnerability!

CSE 141 Computer Architecture Summer Session Lecture 3 ALU Part 2 Single Cycle CPU Part 1. Pramod V. Argade

CPS104 Computer Organization and Programming Lecture 19: Pipelining. Robert Wagner

CS 110 Computer Architecture Review Midterm II

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

CENG 3420 Lecture 06: Datapath

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

CS 152 Computer Architecture and Engineering. Lecture 10: Designing a Multicycle Processor

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

CS61C : Machine Structures

CS152 Computer Architecture and Engineering Lecture 10: Designing a Single Cycle Control. Recap: The MIPS Instruction Formats

Ch 5: Designing a Single Cycle Datapath

Processor (I) - datapath & control. Hwansoo Han

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

Lecture 7 Pipelining. Peng Liu.

CPSC614: Computer Architecture

Recap: The MIPS Subset ADD and subtract EEL Computer Architecture shamt funct add rd, rs, rt Single-Cycle Control Logic sub rd, rs, rt

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

The MIPS Processor Datapath

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Single- Cycle CPU Datapath Control Part 1

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMP2611: Computer Organization. The Pipelined Processor

Lecture 12: Single-Cycle Control Unit. Spring 2018 Jason Tang

Chapter 4. The Processor

Pipeline: Introduction

CS 61C Fall 2016 Guerrilla Section 4: MIPS CPU (Datapath & Control)

Chapter 4 The Processor 1. Chapter 4A. The Processor

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

Midterm I March 12, 2003 CS152 Computer Architecture and Engineering

COMP303 - Computer Architecture Lecture 10. Multi-Cycle Design & Exceptions

Final Exam Spring 2017

Midterm I October 6, 1999 CS152 Computer Architecture and Engineering

CS 61C Summer 2016 Guerrilla Section 4: MIPS CPU (Datapath & Control)

Systems Architecture

CS61c Final Review Fall Andy Carle 12/12/2004

CPE 335 Computer Organization. Basic MIPS Architecture Part I

Pipeline design. Mehran Rezaei

Adding Support for jal to Single Cycle Datapath (For More Practice Exercise 5.20)

Chapter 5 (a) Overview

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Single Cycle MIPS CPU

ECE 361 Computer Architecture Lecture 11: Designing a Multiple Cycle Controller. Review of a Multiple Cycle Implementation

Laboratory 5 Processor Datapath

361 multipath..1. EECS 361 Computer Architecture Lecture 10: Designing a Multiple Cycle Processor

Lecture 8: Control COS / ELE 375. Computer Architecture and Organization. Princeton University Fall Prof. David August

Page 1. Pipelining: Its Natural! Chapter 3. Pipelining. Pipelined Laundry Start work ASAP. Sequential Laundry A B C D. 6 PM Midnight

Guerrilla Session 3: MIPS CPU

Computer Science 61C Spring Friedland and Weaver. The MIPS Datapath

Computer Systems Architecture Spring 2016

Transcription:

Computer Science 6C Spring 27 Working on the Pipeline

Datapath Control Signals Computer Science 6C Spring 27 MemWr: write memory MemtoReg: ALU; Mem RegDst: rt ; rd RegWr: write register 4 PC Ext Imm6 Adder Adder Inst Address npc_sel & Equal Mux PC clk RegDst RegWr busw 32 ALUctr: "add", "sub", "OR",... Extender: zero-ext; sign-ext npc_sel: pc+4; branch-if-equal Rd clk Imm6 Rt Rs 5 5 Rw Ra Rb RegFile 6 ExtOp Rt 5 busa Extender busb 32 32 32 ALUSrc ALUctr Data In clk ALU 32 MemWr 32 WrEn MemtoReg Adr Data Memory 2

Summary of the Control Signals (/2) Computer Science 6C Spring 27 inst Register Transfer add R[rd] R[rs] + R[rt]; PC PC + 4 ALUsrc=RegB, ALUctr= ADD, RegDst=rd, RegWr, npc_sel= +4 sub R[rd] R[rs] R[rt]; PC PC + 4 ALUsrc=RegB, ALUctr= SUB, RegDst=rd, RegWr, npc_sel= +4 ori R[rt] R[rs] + zero_ext(imm6); PC PC + 4 ALUsrc=Im, Extop= Z, ALUctr= OR, RegDst=rt,RegWr, npc_sel= +4 lw R[rt] MEM[ R[rs] + sign_ext(imm6)]; PC PC + 4 ALUsrc=Im, Extop= sn, ALUctr= ADD, MemtoReg, RegDst=rt, RegWr, npc_sel = +4 sw MEM[ R[rs] + sign_ext(imm6)] R[rs]; PC PC + 4 ALUsrc=Im, Extop= sn, ALUctr = ADD, MemWr, npc_sel = +4 beq if (R[rs] == R[rt]) then PC PC + sign_ext(imm6)] else PC PC + 4 npc_sel = br, ALUctr = SUB 3

Summary of the Control Signals (2/2) Computer Science 6C Spring 27 See func We Don t Care :-) Appendix A op add sub ori lw sw beq jump RegDst ALUSrc MemtoReg RegWrite MemWrite npcsel Jump ExtOp ALUctr<2:> x Add x Subtract Or Add x x Add x x x Subtract x x x? x x R-type 3 26 2 6 6 op rs rt rd shamt funct add, sub I-type op rs rt immediate ori, lw, sw, beq J-type op target address jump 4

Boolean Expressions for Controller Computer Science 6C Spring 27 RegDst = add + sub ALUSrc = ori + lw + sw MemtoReg = lw RegWrite = add + sub + ori + lw MemWrite = sw npcsel = beq Jump = jump ExtOp = lw + sw ALUctr[] = sub + beq (assume ALUctr is ADD, SUB, OR) ALUctr[] = or Where: rtype = ~op 5 ~op 4 ~op 3 ~op 2 ~op ~op, ori = ~op 5 ~op 4 op 3 op 2 ~op op lw = op 5 ~op 4 ~op 3 ~op 2 op op sw = op 5 ~op 4 op 3 ~op 2 op op beq = ~op 5 ~op 4 ~op 3 op 2 ~op ~op jump = ~op 5 ~op 4 ~op 3 ~op 2 op ~op How do we implement this in gates? add = rtype func 5 ~func 4 ~func 3 ~func 2 ~func ~func sub = rtype func 5 ~func 4 ~func 3 ~func 2 func ~func 5

Controller Implementation Computer Science 6C Spring 27 opcode func AND logic add sub ori lw sw beq jump OR logic RegDst ALUSrc MemtoReg RegWrite MemWrite npcsel Jump ExtOp ALUctr[] ALUctr[] 6

P&H Figure 4.7 Computer Science 6C Spring 27 7

Summary: Single-cycle Processor Computer Science 6C Spring 27 Five steps to design a processor:. Analyze instruction set à datapath requirements 2. Select set of datapath components & establish clock methodology 3. Assemble datapath meeting the requirements Processor Control Datapath Memory 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. 5. Assemble the control logic Formulate Logic Equations Design Circuits Input Output 8

Single Cycle Performance Computer Science 6C Spring 27 Assume time for actions are ps for register read or write; 2ps for other events Instr Instr fetch Register read ALU op Memory access Register write Total time lw 2ps ps 2ps 2ps ps 8ps sw 2ps ps 2ps 2ps 7ps R-format 2ps ps 2ps ps 6ps beq 2ps ps 2ps 5ps What can we do to improve clock rate? Will this improve performance as well? Want increased clock rate to mean faster programs 9

Gotta Do Laundry Computer Science 6C Spring 27 Alice, Bob, Carol, and Dave A B C D each have one load of clothes to wash, dry, fold, and put away Washer takes 3 minutes Dryer takes 3 minutes Folder takes 3 minutes Stasher takes 3 minutes to put clothes into drawers

Sequential Laundry Computer Science 6C Spring 27 6 PM 7 8 9 2 2 AM T a s k O r d e r A B C D 33 3 3 3 3 3 3 33 3 3 33 3 3 Time Sequential laundry takes 8 hours for 4 loads

Pipelined Laundry Computer Science 6C Spring 27 2 2 AM 6 PM 7 8 9 T a s k O r d e r A B C D 33 3 3 3 3 3 Time Pipelined laundry takes 3.5 hours for 4 loads! 2

Pipelining Lessons (/2) Computer Science 6C Spring 27 Pipelining doesn t help latency of single task, it helps throughput of entire workload Multiple tasks operating simultaneously and independently 6 PM 7 8 9 using different resources Potential speedup = Number pipe stages Time to fill pipeline and time to drain it reduces speedup: 2.3x (8/3.5) v. 4x (8/2) in this example T a s k O r d e A B C D Time 3 3 3 3 3 3 3 3

Pipelining Lessons (2/2) Computer Science 6C Spring 27 Suppose new Washer takes 2 minutes, new Stasher takes 2 minutes. How much faster is pipeline? Pipeline rate limited by slowest pipeline stage Suppose Bob doesn't bother folding his laundry? Idle steps in the pipeline don't enable others to fold Unbalanced lengths and idle stages reduces speedup T a s k O r d e 6 PM 7 8 9 A B C D Time 3 3 3 3 3 3 3 4

Execution Steps in MIPS Datapath Computer Science 6C Spring 27 ) IFtch/IF: Instruction Fetch & Increment PC 2) Dcd/ID: Instruction Decode & Read Registers 3) Exec/EX: Mem-ref: Calculate Address Arith-log: Perform ALU Operation 4) Mem: Load: Read Data from Memory Store: Write Data to Memory Memory is now synchronous 5) WB: Write Data Back to Register 5

Single Cycle Datapath Computer Science 6C Spring 27 PC instruction memory rd rs rt registers ALU Data memory +4 imm. Instruction Fetch 2. Decode/ 3. Execute 4. Memory Register Read 5. Write Back 6

Pipeline registers Computer Science 6C Spring 27 Need registers between stages To hold information produced in previous cycle PC instruction memory rd rs rt registers ALU Data memory +4 imm. Instruction Fetch 2. Decode/ 3. Execute 4. Memory Register Read 5. Write Back 7

More Detailed Pipeline Computer Science 6C Spring 27 8

IF for Load, Store, Computer Science 6C Spring 27 9

ID for Load, Store, Computer Science 6C Spring 27 2

EX for Load Computer Science 6C Spring 27 2

MEM for Load Computer Science 6C Spring 27 22

WB for Load Oops! Computer Science 6C Spring 27 Wrong register number! 23

Corrected Datapath for Load Computer Science 6C Spring 27 24

Pipelined Execution Representation Computer Science 6C Spring 27 Every instruction must take same number of steps, so some stages will idle e.g. MEM stage for any arithmetic instruction Time IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB 25

Graphical Pipeline Diagrams Computer Science 6C Spring 27 Use datapath figure below to represent pipeline: PC MUX +4 instruction memory rd rs rt imm Register File ALU Data memory. Instruction Fetch 2. Decode/ Register Read IF ID EX Mem WB 3. Execute 4. Memory 5. Write Back ALU I$ Reg D$ Reg 26

Graphical Pipeline Representation Computer Science 6C Spring 27 RegFile: left half is write, right half is read Time (clock cycles) I n I$ Reg D$ Reg s Load t I$ Reg D$ Reg r Add O r d e r Store Sub Or ALU I$ ALU Reg I$ ALU Reg I$ D$ ALU Reg Reg D$ ALU Reg D$ Reg 27

Pipelining Performance (/3) Computer Science 6C Spring 27 Use T c ( time between completion of instructions ) to measure speedup Equality only achieved if stages are balanced (i.e. take the same amount of time) If not balanced, speedup is reduced Speedup due to increased throughput Latency for each instrucbon does not decrease 28

Pipelining Performance (2/3) Computer Science 6C Spring 27 Assume time for stages is ps for register read or write 2ps for other stages Instr Instr fetch Register read ALU op Memory access Register write Total time lw 2ps ps 2ps 2ps ps 8ps sw 2ps ps 2ps 2ps 7ps R-format 2ps ps 2ps ps 6ps beq 2ps ps 2ps 5ps What is pipelined clock rate? Compare pipelined datapath with single-cycle datapath 29

Pipelining Performance (3/3) Computer Science 6C Spring 27 Single-cycle T c = 8 ps f =.25GHz Pipelined T c = 2 ps f = 5GHz 3

Clicker/Peer Instruction Computer Science 6C Spring 27 Logic in some stages takes 2ps and in some ps. Clk-Q delay is 3ps and setup-time is 2ps. What is the maximum clock frequency at which a pipelined design can operate? A: GHz B: 5GHz C: 6.7GHz D: 4.35GHz E: 4GHz 3

Project 3... Computer Science 6C Spring 27 Project 3. will be released in a couple of hours In project 3, you will build a CPU in logisim. 3. is the ALU and register file 3.2 is putting together the control logic 32

We Grossly Simplified The Project... Why? Computer Science 6C Spring 27 Last semester and last year it was building effectively a full MIPS Now it is a much smaller architecture with narrower words: Why are we cheating you out of the experience? We use logisim for pedagogical reasons Almost all design these days uses "HDL" (High-Level Design Languages) like VHDL and Verilog In an HDL, doing a 32b, 32 register register file is no harder than doing a 6b, 8 register one But in logisim, it is at least 4x more work... And 4x more chance to make an error 33

Why Nick's Ph.D. Was An Incredibly Stupid Idea... Computer Science 6C Spring 27 My Ph.D. was on a highly pipelined FPGA architecture FPGA -> Field Programmable Gate Array: Basically programmable hardware The design was centered around being able to pipeline multiple independent tasks We will see on Wednesday how to handle "pipeline hazards" and "forwarding: add $s $s $s2 add $s3 $s $s4 This is critical to get real performance gains But my dissertation design didn't have this ability I also showed how you could use the existing registers in the FPGA to heavily pipeline it automatically 34

But pipelining is not free! Computer Science 6C Spring 27 Not only does pipelining not improve latency... It actually makes it worse! Two sources: Unbalanced pipeline stages The setup & clk->q time for the pipeline registers Pipelining only independent tasks also can't "forward" So independent task pipelining is only about reducing cost You can always just duplicate logic instead Latency is fundamental, independent task throughput can always be solved by throwing $$$ at the problem So I proved my Ph.D. design was no better than the conventional FPGA on throughput/$ and far far far worse on latency! 35