Lecture 8: Control COS / ELE 375. Computer Architecture and Organization. Princeton University Fall Prof. David August

Similar documents
CPE 335. Basic MIPS Architecture Part II

ﻪﺘﻓﺮﺸﻴﭘ ﺮﺗﻮﻴﭙﻣﺎﻛ يرﺎﻤﻌﻣ MIPS يرﺎﻤﻌﻣ data path and ontrol control

Processor: Multi- Cycle Datapath & Control

The Processor: Datapath & Control

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization

ECE369. Chapter 5 ECE369

Lecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

Systems Architecture I

Design of the MIPS Processor

Computer Science 141 Computing Hardware

Design of the MIPS Processor (contd)

CSE 2021 COMPUTER ORGANIZATION

Inf2C - Computer Systems Lecture 12 Processor Design Multi-Cycle

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

Multiple Cycle Data Path

Chapter 5: The Processor: Datapath and Control

EECE 417 Computer Systems Architecture

CSE 2021 COMPUTER ORGANIZATION

Major CPU Design Steps

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

Lets Build a Processor

LECTURE 6. Multi-Cycle Datapath and Control

ALUOut. Registers A. I + D Memory IR. combinatorial block. combinatorial block. combinatorial block MDR

CC 311- Computer Architecture. The Processor - Control

RISC Processor Design

Chapter 4 The Processor (Part 2)

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

Chapter 5 Solutions: For More Practice

Multicycle Approach. Designing MIPS Processor

Topic #6. Processor Design

ENE 334 Microprocessors

Processor (multi-cycle)

Processor (I) - datapath & control. Hwansoo Han

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

LECTURE 5. Single-Cycle Datapath and Control

Lecture 5: The Processor

Alternative to single cycle. Drawbacks of single cycle implementation. Multiple cycle implementation. Instruction fetch

Points available Your marks Total 100

Systems Architecture

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Data Paths and Microprogramming

Control & Execution. Finite State Machines for Control. MIPS Execution. Comp 411. L14 Control & Execution 1

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

RISC Architecture: Multi-Cycle Implementation

Review: Abstract Implementation View

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

CENG 3420 Lecture 06: Datapath

RISC Architecture: Multi-Cycle Implementation

RISC Design: Multi-Cycle Implementation

ECE232: Hardware Organization and Design

CPU Design Steps. EECC550 - Shaaban

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan)

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

Chapter 4. The Processor. Computer Architecture and IC Design Lab

COMP303 Computer Architecture Lecture 9. Single Cycle Control

Designing a Multicycle Processor

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

TDT4255 Computer Design. Lecture 4. Magnus Jahre. TDT4255 Computer Design

October 24. Five Execution Steps

MIPS-Lite Single-Cycle Control

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

Single vs. Multi-cycle Implementation

CSEN 601: Computer System Architecture Summer 2014

CSSE232 Computer Architecture I. Mul5cycle Datapath

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

CSE Lecture In Class Example Handout

Note- E~ S. \3 \S U\e. ~ ~s ~. 4. \\ o~ (fw' \i<.t. (~e., 3\0)

CPE 335 Computer Organization. Basic MIPS Architecture Part I

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

CSE Computer Architecture I Fall 2009 Lecture 13 In Class Notes and Problems October 6, 2009

Single Cycle CPU Design. Mehran Rezaei

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

COMP2611: Computer Organization. The Pipelined Processor

CPU Organization (Design)

Multicycle conclusion

The MIPS Processor Datapath

Inf2C - Computer Systems Lecture Processor Design Single Cycle

The Processor: Datapath & Control

Computer Architecture Chapter 5. Fall 2005 Department of Computer Science Kent State University

Using a Hardware Description Language to Design and Simulate a Processor 5.8

COMP303 - Computer Architecture Lecture 10. Multi-Cycle Design & Exceptions

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Design of Digital Circuits Lecture 13: Multi-Cycle Microarch. Prof. Onur Mutlu ETH Zurich Spring April 2017

Digital Design & Computer Architecture (E85) D. Money Harris Fall 2007

Laboratory 5 Processor Datapath

Single Cycle Data Path

The overall datapath for RT, lw,sw beq instrucution

Working on the Pipeline

Chapter 4. The Processor

Fundamentals of Computer Systems

361 datapath.1. Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Processor Implementation in VHDL. University of Ulster at Jordanstown University of Applied Sciences, Augsburg

CS 152 Computer Architecture and Engineering. Lecture 10: Designing a Multicycle Processor

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

ECS 154B Computer Architecture II Spring 2009

Transcription:

Lecture 8: Control COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1

Datapath and Control Datapath The collection of state elements, computation elements, and interconnections that together provide a conduit for the flow and transformation of data in the processor during execution. - DIA Control The component of the processor that commands the datapath, memory, and I/O devices according to the instructions of the program. P&H 2

3 A Real MIPS Datapath

4 Datapath with Control

Datapath with Control Memory reference: lw, sw Arithmetic/logical: add, sub, and, or, slt Control flow: beq, j (see book) 5

The Control Unit Generates Control Signals RegDst, Branch, MemRead, MemtoReg, ALUOp, MemWrite, ALUSrc, RegWrite 6

The Control Unit Generates Control Signals Uses Op Field [31-26] 7

Control Signals PCSrc True: PC = SignExt(Imm 16 ) << 2 + PC + 4 False: PC = PC + 4 PCSrc = Branch & Zero (Branch from Control, Zero from ALU) 8

Control Signals RegDst True: WriteReg = Inst[15-11] False: WriteReg = Inst[20-16] 9

Control Signals ALUSrc True: SignExt(Imm 16 ) False: RegisterData2 10

Control Signals ALUControl Values: And, Or, Add, Sub Uses Funct Field ALUControl Function 000 AND 001 OR 010 add 110 subtract 111 set on less than 11

ALUControl ALUOp Funct field Operation ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0 0 0 X X X X X X 010 X 1 X X X X X X 110 1 X X X 0 0 0 0 010 1 X X X 0 0 1 0 110 1 X X X 0 1 0 0 000 1 X X X 0 1 0 1 001 1 X X X 1 0 1 0 111 lw/sw beq R-Type ALUControl Function 000 AND 001 OR 010 add 110 subtract 111 set on less than 12

13 ALU Control Combinational Implementation

Control Signals MemtoReg True: RegisterWriteData = MemReadData False: RegisterWriteData = ALUResult 14

Control Signals MemRead MemWrite RegWrite 15

The Control Unit Generates Control Signals Uses Op Field [31-26] 16

Control Design: Inputs Opcode op5 op4 op3 op2 op1 op0 Value R-Format 0 0 0 0 0 0 0 10 lw 1 0 0 0 1 1 35 10 sw 1 0 1 0 1 1 43 10 beq 0 0 0 1 0 0 4 10 Where were these values chosen? 17

Control Design: Outputs Signal R-Format lw sw beq RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp1 ALUOp2 You fill it out with: 0, 1, X (don t care) - check with book 18

19 Datapath with Control

20 Combinational Implementation

21

Single Cycle Implementation All of the logic is combinational We wait for everything to settle down ALU might not produce right answer right away Use write signals along with clock to determine when to write Cycle time determined by length of the longest path

Single Cycle Implementation Calculate cycle time assuming negligible delays except: memory (2ns), ALU/adders (2ns), register file access (1ns)

Single Cycle Implementation 2ns 1ns 2ns 2ns 1ns Load Word is longest running instruction (prove to yourself by computing time for sw, br, R-type) memory (2ns), ALU/adders (2ns), register file access (1ns) 2ns + 1ns + 2ns + 2ns + 1ns = 8ns

Magic Moment! At this point, you should be able to design a computer! 1. Analyze instruction set for datapath requirements 2. Select set of datapath components 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control signals 5. Assemble the control logic 6. Set clock rate 25

What s Wrong with Single Cycle? Calculate cycle time assuming negligible delays except: memory (2ns), ALU/adders (2ns), register file access (1ns)

What s Wrong with Single Cycle? memory (2ns), ALU/adders (2ns), register file access (1ns) Inst. Inst. Mem Reg. Read ALU Data Mem Reg. Write Total ALU 2 1 2 1 6 lw 2 1 2 2 1 8 sw 2 1 2 2 7 br 2 1 2 5 27

What s Wrong with Single Cycle? Arithmetic & Logical PC Inst Memory Reg File mux ALU mux setup Load PC Inst Memory Reg File mux ALU Data Mem mux setup Critical Path Store PC Inst Memory Reg File mux ALU Data Mem Branch PC Inst Memory Reg File cmp mux Long Cycle Time All instructions take as much time as the slowest Real memory is not so nice as our idealized memory (cannot always get the job done in fixed amount of time) 28

How Bad? Assume: 100 instructions executed 25% of instructions are loads (8ns), 10% of instructions are stores (7ns), 45% of instructions are adds (6ns), and 20% of instructions are branches (5ns). Single-cycle execution: 100 * 8ns = 800 ns Optimal execution: 25*8ns + 10*7ns + 45*6ns + 20*5ns = 640 ns Speedup = 800/640 = 1.25 29

Other Problems Instruction and Data Memory are the same Some units hold value long after job is complete Could reuse ALU for example Underutilized resources (wasteful of area/power) 30

Other Problems What about floating point or other VERY LONG instructions? 31

32

Multicycle Approach Break up the instructions into steps, one per cycle balance the amount of work to be done restrict each cycle to use only one major functional unit At the end of a cycle store values for use in later cycles (easiest thing to do) introduce additional internal registers to hold values between cycles

Multicycle Approach Architectural vs. Microarchitectural State? Register File?

Multicycle Control More Mux Magic

Multicycle Control Reuse datapath components ALU used to compute address and to increment PC Memory used for instruction and data Control signals not determined solely by instruction What should the ALU do for a subtract instruction?

37 Multicycle Control

Use Finite State Machine For Control Set of states Next state function (determined by current state and the input) Output function (determined by current state and possibly input) We ll use a Moore machine (output based only on current state) To derive FSM, need steps of the instruction!

39

Five Execution Steps 1. Instruction Fetch 2. Instruction Decode and Register Fetch 3. Execution, Memory Address Computation, or Branch Completion 4. Memory Access or R-type instruction completion 5. Write-Back Step INSTRUCTIONS TAKE FROM 3-5 STEPS/CYCLES!

Step 1: Instruction Fetch Use PC to get instruction and put it in the Instruction Register. Increment the PC by 4 and put result back in the PC. Can be described succinctly using RTL "Register- Transfer Language" IR = Memory[PC]; PC = PC + 4; Can we figure out the values of the control signals? ALU? IR? What is the advantage of updating the PC now?

Step 1: Instruction Fetch IR = Memory[PC]; PC = PC + 4;

Step 2: Instruction Decode & Register Fetch Read registers rs and rt in case we need them Compute the branch address in case the instruction is a branch RTL: A = Reg[IR[25-21]]; B = Reg[IR[20-16]]; ALUOut = PC + (SgnExt(IR[15-0]) << 2); Note: Again, no control lines based on the instruction type (busy "decoding" it in control logic) Reg? ALU? What if not branch instruction?

Step 2: A = Reg[IR[25-21]]; B = Reg[IR[20-16]]; ALUOut = PC + (SgnExt(IR[15-0]) << 2);

Step 3: (Instruction Dependent) ALU performs a function based on instruction type: Memory Reference: ALUOut = A + sign-extend(ir[15-0]); R-type: ALUOut = A op B; Branch: if (A==B) PC = ALUOut; ALU Utilization rate to step 3?

Step 4: (R-type or memory-access) Loads and stores access memory MDR = Memory[ALUOut]; or Memory[ALUOut] = B; R-type instructions finish Reg[IR[15-11]] = ALUOut; Writes takes place at the end of the cycle on the edge

Step 5: Write-Back Reg[IR[20-16]]= MDR; What about all the other instructions?

Summary Of Steps Step name Instruction fetch Instruction decode/register fetch Action for R-type instructions Action for memory-reference Action for instructions branches IR = Memory[PC] PC = PC + 4 A = Reg [IR[25-21]] B = Reg [IR[20-16]] ALUOut = PC + (sign-extend (IR[15-0]) << 2) Action for jumps Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] II computation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2) jump completion Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut] completion ALUOut or Store: Memory [ALUOut] = B Memory read completion Load: Reg[IR[20-16]] = MDR

Implementing the Control Values of control signals are dependent upon: 1. instruction is being executed 2. step is being performed Using steps, specify a finite state machine specify the finite state machine graphically, or use microprogramming (more on this next time) Implementation can be derived from specification

Graphical Specification of FSM

Graphical Specification of FSM Add r1, r2, r3 sw r1, 100(r2) jr r31

Performance Evaluation What is the average CPI? state diagram gives CPI for each instruction type workload gives frequency of each type Type CPI i for type Frequency CPI i x freqi i Arith/Logic 4 40% 1.6 Load 5 30% 1.5 Store 4 10% 0.4 branch 3 20% 0.6 Average CPI: 4.1 52

Single/Multi-Cycle Summary Single Cycle Datapath: CPI = 1!! Long cycle time L (critical path based) Multiple Cycle Datapath: Short cycle time!! CPI = 3-5 L Can we achieve a CPI of 1 (on average) with a clock cycle time similar to the multiple cycle datapath? 53