Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

Similar documents
CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

MIPS-Lite Single-Cycle Control

361 control.1. EECS 361 Computer Architecture Lecture 9: Designing Single Cycle Control

CS359: Computer Architecture. The Processor (A) Yanyan Shen Department of Computer Science and Engineering

COMP303 Computer Architecture Lecture 9. Single Cycle Control

How to design a controller to produce signals to control the datapath

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

Working on the Pipeline

EEM 486: Computer Architecture. Lecture 3. Designing Single Cycle Control

ECE170 Computer Architecture. Single Cycle Control. Review: 3b: Add & Subtract. Review: 3e: Store Operations. Review: 3d: Load Operations

CS 61C: Great Ideas in Computer Architecture Control and Pipelining

CS3350B Computer Architecture Winter Lecture 5.7: Single-Cycle CPU: Datapath Control (Part 2)

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

361 datapath.1. Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath

CS152 Computer Architecture and Engineering Lecture 10: Designing a Single Cycle Control. Recap: The MIPS Instruction Formats

The Big Picture: Where are We Now? EEM 486: Computer Architecture. Lecture 3. Designing a Single Cycle Datapath

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

CS 110 Computer Architecture Single-Cycle CPU Datapath & Control

Recap: The MIPS Subset ADD and subtract EEL Computer Architecture shamt funct add rd, rs, rt Single-Cycle Control Logic sub rd, rs, rt

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Lecture #17: CPU Design II Control

CPU Organization (Design)

ECE468 Computer Organization and Architecture. Designing a Single Cycle Datapath

Lecture 7 Pipelining. Peng Liu.

Major CPU Design Steps

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Chapter 4. The Processor

Processor (I) - datapath & control. Hwansoo Han

COMP2611: Computer Organization. The Pipelined Processor

CS61C : Machine Structures

Single Cycle CPU Design. Mehran Rezaei

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Single- Cycle CPU Datapath & Control Part 2

CS61C : Machine Structures

UC Berkeley CS61C : Machine Structures

The Processor: Datapath & Control

Outline. EEL-4713 Computer Architecture Designing a Single Cycle Datapath

CS3350B Computer Architecture Quiz 3 March 15, 2018

CPU Design Steps. EECC550 - Shaaban

Chapter 4 The Processor 1. Chapter 4A. The Processor

CS 61C: Great Ideas in Computer Architecture. MIPS CPU Datapath, Control Introduction

Chapter 4. The Processor

UC Berkeley CS61C : Machine Structures

Lecture 6 Datapath and Controller

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

Designing a Multicycle Processor

CS 61C: Great Ideas in Computer Architecture Datapath. Instructors: John Wawrzynek & Vladimir Stojanovic

Recap: A Single Cycle Datapath. CS 152 Computer Architecture and Engineering Lecture 8. Single-Cycle (Con t) Designing a Multicycle Processor

Chapter 4. The Processor

Pipelining. CSC Friday, November 6, 2015

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

Computer Architecture. Lecture 6.1: Fundamentals of

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

Systems Architecture

LECTURE 5. Single-Cycle Datapath and Control

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 28: Single- Cycle CPU Datapath Control Part 1

Chapter 4. The Processor

CENG 3420 Lecture 06: Datapath

University of California College of Engineering Computer Science Division -EECS. CS 152 Midterm I

ECE232: Hardware Organization and Design

Lecture 12: Single-Cycle Control Unit. Spring 2018 Jason Tang

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

Review. N-bit adder-subtractor done using N 1- bit adders with XOR gates on input. Lecture #19 Designing a Single-Cycle CPU

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition. Chapter 4. The Processor

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

CS61C : Machine Structures

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

COMPUTER ORGANIZATION AND DESIGN

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

UC Berkeley CS61C : Machine Structures

CS 61C Fall 2016 Guerrilla Section 4: MIPS CPU (Datapath & Control)

Chapter 4. The Processor

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

CPE 335 Computer Organization. Basic MIPS Architecture Part I

Ch 5: Designing a Single Cycle Datapath

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

Chapter 4. The Processor Designing the datapath

CS 152 Computer Architecture and Engineering. Lecture 10: Designing a Multicycle Processor

CS 61C Summer 2016 Guerrilla Section 4: MIPS CPU (Datapath & Control)

Pipeline design. Mehran Rezaei

COMPUTER ORGANIZATION AND DESIGN

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

CSE 141 Computer Architecture Summer Session Lecture 3 ALU Part 2 Single Cycle CPU Part 1. Pramod V. Argade

CS 61C: Great Ideas in Computer Architecture Lecture 12: Single- Cycle CPU, Datapath & Control Part 2

Final Exam Spring 2017

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Single Cycle MIPS CPU

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Single- Cycle CPU Datapath & Control Part 2. Clk

If you didn t do as well as you d hoped

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed

CPS104 Computer Organization and Programming Lecture 19: Pipelining. Robert Wagner

Midterm I March 3, 1999 CS152 Computer Architecture and Engineering

inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 18 CPU Design: The Single-Cycle I ! Nasty new windows vulnerability!

LECTURE 3: THE PROCESSOR

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Transcription:

CSCI 42: Computer Architectures The Processor (2) Fengguang Song Department of Computer & Information Science IUPUI Full Datapath Branch Target Instruction Fetch Immediate 4

Today s Contents We have looked at how to design a Data Path. 4.4, 4.5 Today, we will design a Control Unit for a single-cycle processor (i.e., how to set 8 control signals) Also learn a new pipeline processor (i.e., a much faster implementation) Processor Control Memory Input Datapath Output 5 How to Control the Instruction Fetch Unit? Our st control signal: PC_sel Inst Memory Adr Instruction<3:> PC_sel st control PC_sel works as follows:. Increase by 4 if PC_sel = 2. Branch target if PC_sel = 4 imm6 Sign Et Adder Adder PC Mu Clk After fetching, we will eecute it à 6 2

Activated Datapath for Eecuting Add Rd Rt RegDst = Mu Rs Rt RegWr = 5 5 5 busw 32 Clk 3 8 control signals imm6 26 2 6 6 op rs rt rd shamt funct Rw Ra Rb 32 32-bit Registers 6 PC_sel= incr busb 32 Etender Clk busa 32 ALUctr = Add 32 Mu EtOp = don t care ALUSrc = Instruction Fetch Unit Data In ALU Clk Rs Rt Rd MemWr = Zero 32 32 Instruction<3:> <2:25> <6:2> WrEn Adr Data Memory <:5> 32 <:5> Imm6 MemtoReg = Mu 7 But, How to Generate Correct Control Signals? Control signals are derived from the instruction. R-type Load/ Store Branch rs rt rd shamt funct 3:26 25:2 2:6 5: :6 5: 35 or 43 rs rt address 3:26 25:2 2:6 5: 4 rs rt address 3:26 25:2 2:6 5: opcode always read read, ecept for load write for R-type and load sign-etend then add 8 3

Adding Control to Datapath Instruction<3:> Inst Memory Adr <5:> <3:26> Op Fun <2:25> Rs <2:6> Rt <:5> Rd <:5> Imm6 Inputs: blue variables Outputs: red variables Control Unit: A Combinational Logic Circuit PC_sel RegWr RegDst EtOp ALUSrc ALUctr MemWr MemtoReg Zero? DATA PATH 9 inst A Few Eamples of Control Signals Register Transfer ADD R[rd] ß R[rs] + R[rt]; PC ß PC + 4 ALUsrc = BusB, ALUctr = add, RegDst = rd, RegWr, PC_sel = incr SUB R[rd] ß R[rs] R[rt]; PC ß PC + 4 ALUsrc = BusB, ALUctr = sub, RegDst = rd, RegWr, PC_sel = incr ORi R[rt] ß R[rs] OR zero_et(imm6); PC ß PC + 4 ALUsrc = Im, Etop = Z, ALUctr = or, RegDst = rt, RegWr, PC_sel = incr LOAD R[rt] ß MEM[ R[rs] + sign_et(imm6) ]; PC ß PC + 4 ALUsrc = Im, Etop = Sign, ALUctr = add, MemtoReg=, RegDst = rt, RegWr, PC_sel = incr STORE MEM[ R[rs] + sign_et(imm6)] ß R[rt]; PC ß PC + 4 ALUsrc = Im, Etop = Sn, ALUctr = add, MemWr, PC_sel = incr BEQ if ( R[rs] == R[rt] ) then PC ß PC + 4 + sign_et(imm6)]*4; else PC ß PC + 4 PC_sel = output of ALU, ALUctr = sub 4

Summary of 8 Control Signals (for 7 instructions) See MIPS reference First 2 columns are identical ecept last row ->can be combined! RegDst ALUSrc MemtoReg RegWrite MemWrite PCsel func N/A op add sub ori lw sw beq jump 7 instr. EtOp ALUctr<3:> Add Subtract Or Add Add Subtract 3 R-type I-type J-type op 26 2 6 6 rs rt rd shamt funct add, sub op rs rt immediate ori, lw, sw, beq op target address jump RegDst ALUSrc MemtoReg RegWrite MemWrite Branch EtOp ALUop<:> The Concept of Local Decoding op R-type ori lw sw beq jump R-type Or Add First two columns in prev slide collapsed to one func op Main 6 6 Control ALUop 2 ALU Control (Local) Add Subtract ALUctr 4 ALUctr generated locally based on funct code ALU 4 classes. Need 2 bits 2 5

The ALU Control Assume 2-bit ALUOp derived from opcode Net, combinational logic derives the ALU control opcode ALUOp Operation funct ALU function ALUCtr lw load word XXXXXX add sw store word XXXXXX add beq branch equal XXXXXX subtract ori or immediate XXXXXX OR R-type add add subtract subtract AND AND OR OR set-on-less-than set-on-less-than 3 ALU Control ALU ALUCtr ALU Function AND OR add subtract set-on-less-than NOR 4 6

Truth Table for the Main Control op 6 Main Control RegDst ALUSrc : ALUop func 6 ALU Control (Local) ALUctr 4 2 op R-type ori lw sw beq jump RegDst ALUSrc MemtoReg RegWrite MemWrite Branch Jump EtOp ALUop (Symbolic) R-type Or Add Add Subtract ALUop <> ALUop <> These columns don t need func 6 A Simple Datapath + Control Based on the previous truth table OP 2 bits 4 bits Func 7 7

R-Type Instruction func ALU Ctr 8 Load Instruction add 9 8

Branch-on-Equal Instruction (beq) sub 2 Finally, Implementing Jumps (j) J-type 2 address 3:26 25: Jump uses word addressing It updates PC with a concatenation of: The most significant 4 bits of <current PC+4>, 26-bit jump address (shift left by 2 bits to get 32-bit address) Now, we need a new control signal decoded from opcode for jump 2 9

DatapathWith Jumps Added 4 bits 22 Performance Issues Yes, the previous Single-Cycle CPU can work correctly But the longest delay will determine the CPU clock cycle What is the critical (or longest) path in the processor? The load instruction Instruction memory register file ALU data memory register file Could be even longer if you deal with floating point numbers Working, but this violates the design principle of: Making the common case fast Net, we will improve it using pipelining 23

Pipeline is natural! Pipelining Analogy The classic laundry eample: Washer, Dryer, Folder, Storer Total = 8 hours Total = 3.5 hours n n 4 loads: n Speedup = 8/3.5 = 2.3X Non-stop (in a steady state): n Speedup 4 (2N/.5N) = Number of stages 24 Important Lessons about Pipelining Pipelining doesn t help latency of single task, but helps throughput of entire workload Multiple tasks can operate simultaneously using different resources (i.e., in parallel) Potential speedup = Number of pipeline stages Pipeline rate limited by slowest stage Unbalanced lengths of stages reduce speedup Time to fill pipeline and time to drain will reduce speedup May stall for dependences 25

The MIPS CPU Pipeline 5 pipeline stages on MIPS processors:. IF: Instruction Fetch from memory 2. ID: Instruction Decode and Register Read 3. EX: Eecute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register file 26 Pipeline Stage s Performance Assume time for different stages is: ps for ID stage ps for WB stage 2ps for all the other stages Performance of the old single-cycle datapath design: Instructio n eample Instr fetch Register read ALU op Memory access Register Write Back Total time lw 2ps ps 2ps 2ps ps 8ps sw 2ps ps 2ps 2ps 7ps R-format 2ps ps 2ps ps 6ps beq 2ps ps 2ps 5ps 27 2

Single-Cycle vs Pipeline CPU Single-cycle (CC = 8ps) Pipelined (CC = 2ps) Because 2ps is the slowest stage time 4 speedup! 28 A Simple Convenient Pipeline Representation Time IFetch ID Eec Mem WB IFetch ID Eec Mem WB IFetch ID Eec Mem WB IFetch ID Eec Mem WB Program Flow IFetch ID Eec Mem WB IFetch ID Eec Mem WB 29 3

Pipeline Speedup If all stages are balanced (i.e., all stages take the same time) t pipelined = t nonpipelined # of stages If stages are Not balanced, speedup becomes less. The obtained speedup is due to an increased throughput. Note: Latency (i.e., time of each instruction) does not necessarily improve! Under ideal conditions and with many instructions, Speedup is equal to #Stages. 3 MIPS s ISA Design is Suitable for Pipelining All MIPS instructions are 32 bits Easier to fetch and decode VS 86 CISC: - to 7-byte instructions, more difficult So PC+?? //it depends. Has very regular instruction formats So that we can decode and read registers simultaneously in one stage Only load/store can access memory Can calculate address in EX stage, access memory in MEM stage (i.e., E, Mem, WR) Alignment of memory operands So memory access takes only one cycle (in one stage) One data transfer. 3 4

Pipeline Hazards Hazards : Situations when the net instruction cannot eecute in the net cycle. Structural hazards A required resource (e.g., memory) is occupied See more details in net slide 2. Data hazards Need to wait for previous instruction s data to complete its data read/write 3. Control hazards Depend on a control action from the previous instruction (a branch instruction: beq) 32 Structural Hazards When there is a conflict for the use of a resource already occupied (e.g., one memory unit) Suppose MIPS pipeline has a single memory Load/store instructions requires using memory unit Instruction Fetch would have to stall for that cycle This causes a pipeline bubble Hence, MIPS pipelines require separate instruction and data memories (2) In order to avoid a structural hazard Fortunately, MIPS is well designed so that there is No structural hazard. 33 5

How about Data Hazards? An instruction depends on the completion of a data access by a previous instruction add $s, $t, $t 2 sub $t2, $s, $t3 Shading on right --> register is read Shading on left --> register is written Wait for 3 cycles Figure: Graphical representation of the instruction pipeline. 34 6