COMP303 - Computer Architecture Lecture 10. Multi-Cycle Design & Exceptions

Similar documents
Lecture 10 Multi-Cycle Implementation

Major CPU Design Steps

CPU Design Steps. EECC550 - Shaaban

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

CSE 2021 Computer Organization. Hugh Chesser, CSEB 1012U W10-M

Designing a Multicycle Processor

RISC Processor Design

Processor: Multi- Cycle Datapath & Control

Lecture 5: The Processor

CPE 335. Basic MIPS Architecture Part II

Inf2C - Computer Systems Lecture 12 Processor Design Multi-Cycle

Computer Organization & Design The Hardware/Software Interface Chapter 5 The processor : Datapath and control

COMP303 Computer Architecture Lecture 9. Single Cycle Control

ALUOut. Registers A. I + D Memory IR. combinatorial block. combinatorial block. combinatorial block MDR

What do we have so far? Multi-Cycle Datapath (Textbook Version)

ENE 334 Microprocessors

CS 152 Computer Architecture and Engineering. Lecture 10: Designing a Multicycle Processor

LECTURE 6. Multi-Cycle Datapath and Control

361 datapath.1. Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath

ﻪﺘﻓﺮﺸﻴﭘ ﺮﺗﻮﻴﭙﻣﺎﻛ يرﺎﻤﻌﻣ MIPS يرﺎﻤﻌﻣ data path and ontrol control

RISC Architecture: Multi-Cycle Implementation

Systems Architecture I

Lecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón

MIPS-Lite Single-Cycle Control

RISC Architecture: Multi-Cycle Implementation

361 control.1. EECS 361 Computer Architecture Lecture 9: Designing Single Cycle Control

CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath

RISC Design: Multi-Cycle Implementation

CSE 2021 COMPUTER ORGANIZATION

CS3350B Computer Architecture Winter Lecture 5.7: Single-Cycle CPU: Datapath Control (Part 2)

Initial Representation Finite State Diagram. Logic Representation Logic Equations

ECE 313 Computer Organization EXAM 2 November 9, 2001

Initial Representation Finite State Diagram Microprogram. Sequencing Control Explicit Next State Microprogram counter

Points available Your marks Total 100

The Big Picture: Where are We Now? EEM 486: Computer Architecture. Lecture 3. Designing a Single Cycle Datapath

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good

CSE 2021 COMPUTER ORGANIZATION

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

CPU Organization (Design)

CC 311- Computer Architecture. The Processor - Control

CS152 Computer Architecture and Engineering Lecture 13: Microprogramming and Exceptions. Review of a Multiple Cycle Implementation

Alternative to single cycle. Drawbacks of single cycle implementation. Multiple cycle implementation. Instruction fetch

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

Lab 8: Multicycle Processor (Part 1) 0.0

Topic #6. Processor Design

Chapter 5 Solutions: For More Practice

Microprogramming. Microprogramming

CS61C : Machine Structures

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

Lecture #17: CPU Design II Control

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

Multi-cycle Approach. Single cycle CPU. Multi-cycle CPU. Requires state elements to hold intermediate values. one clock cycle or instruction

CSE 2021: Computer Organization Fall 2010 Solution to Assignment # 3: Multicycle Implementation

Chapter 5: The Processor: Datapath and Control

Working on the Pipeline

CS 110 Computer Architecture Single-Cycle CPU Datapath & Control

ECE468 Computer Organization and Architecture. Designing a Single Cycle Datapath

CS 61C: Great Ideas in Computer Architecture. MIPS CPU Datapath, Control Introduction

Control Unit for Multiple Cycle Implementation

Recap: A Single Cycle Datapath. CS 152 Computer Architecture and Engineering Lecture 8. Single-Cycle (Con t) Designing a Multicycle Processor

CS 61C: Great Ideas in Computer Architecture Control and Pipelining

CS152 Computer Architecture and Engineering. Lecture 8 Multicycle Design and Microcode John Lazzaro (

EECS 322 Computer Architecture Improving Memory Access: the Cache

CS61C : Machine Structures

Multiple Cycle Data Path

Lecture 8: Control COS / ELE 375. Computer Architecture and Organization. Princeton University Fall Prof. David August

ECE369. Chapter 5 ECE369

CS359: Computer Architecture. The Processor (A) Yanyan Shen Department of Computer Science and Engineering

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

CSE 141 Computer Architecture Summer Session Lecture 3 ALU Part 2 Single Cycle CPU Part 1. Pramod V. Argade

UC Berkeley CS61C : Machine Structures

CS 61C: Great Ideas in Computer Architecture Datapath. Instructors: John Wawrzynek & Vladimir Stojanovic

Single-Cycle Examples, Multi-Cycle Introduction

Mapping Control to Hardware

ECE 313 Computer Organization Name SOLUTION EXAM 2 November 3, Floating Point 20 Points

The Processor: Datapath & Control

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

University of California College of Engineering Computer Science Division -EECS. CS 152 Midterm I

UC Berkeley CS61C : Machine Structures

EEM 486: Computer Architecture. Lecture 3. Designing Single Cycle Control

The Processor: Datapath & Control

Implementing the Control. Simple Questions

UC Berkeley CS61C : Machine Structures

Single Cycle CPU Design. Mehran Rezaei

How to design a controller to produce signals to control the datapath

Ch 5: Designing a Single Cycle Datapath

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Single- Cycle CPU Datapath & Control Part 2

Chapter 4. The Processor. Computer Architecture and IC Design Lab

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 28: Single- Cycle CPU Datapath Control Part 1

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

Processor (I) - datapath & control. Hwansoo Han

Design of the MIPS Processor

inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 18 CPU Design: The Single-Cycle I ! Nasty new windows vulnerability!

Computer and Information Sciences College / Computer Science Department The Processor: Datapath and Control

ECE232: Hardware Organization and Design

Computer Architecture Chapter 5. Fall 2005 Department of Computer Science Kent State University

ECE473 Computer Architecture and Organization. Processor: Combined Datapath

Multicycle Approach. Designing MIPS Processor

Lets Build a Processor

Transcription:

COP33 - Computer Architecture Lecture ulti-cycle Design & Exceptions

Single Cycle Datapath We designed a processor that requires one cycle per instruction RegDst busw 32 Clk RegWr Rd ux imm6 Rt 5 5 Rs 5 Rw Ra Rb 32 32-bit Registers 6 Rt busb 32 Sign Extender npc_sel Clk busa 32 32 ux ALUSrc Fetch Unit ALUctr Data In ALU 32 Clk Equal 32 <3:> Rt <2:25> WrEn Rs <6:2> emwr Adr Data emory Rd emrd <:5> 32 <:5> Imm6 ux emtoreg

What s wrong with our CPI= processor? Arithmetic & Logical PC Inst emory Reg File mux ALU mux setup Load PC Inst emory Reg File mux ALU Data em mux setup Critical Path Store PC Inst emory Reg File mux ALU Data em Branch PC Inst emory Reg File cmp mux Long cycle time All instructions take as much time as the slowest Real memory is not so nice as our idealized memory cannot always get the job done in one (short) cycle

Reducing Cycle Time Cut combinational dependency graph and insert register / latch Do same work in two fast cycles, rather than one slow one storage element storage element Acyclic Combinational Logic Acyclic Combinational Logic (A) => storage element storage element Acyclic Combinational Logic (B) storage element

Basic Limits on Cycle Time Next address logic PC <= branch? PC + 4 + offset : PC + 4 Fetch Reg <= em[pc] Operand Fetch A <= R[rs], B <= R[rt] ALU execution ALUOut <= A op B Data emory Access <= em[aluout] Result Store R[rd] <= ALUOut Control npc_sel ALUSrc ALUctr emrd emwr RegDst RegWr Next PC PC Fetch Operand Fetch Exec emory Access Reg. File Result Store

Partitioning the CPI= Datapath Add registers between smallest steps Exec Next PC PC Fetch Operand Fetch emory Access Reg. File npc_sel ALUSrc ALUctr emrd emwr RegDst RegWr Result Store Allow the instruction to take multiple cycles.

Example ulticycle Datapath Fetch Operand Fetch Result Store em Access Next PC PC IR Ext ALU Reg. File npc_sel ALUSrc ALUctr emrd emwr emtoreg RegDst RegWr Reg File A B S Additional registers are added to store values between stages.

R-type instructions (add, sub,...) inst Logical Register Transfers ADD R[rd] R[rs] + R[rt]; PC PC + 4 inst ADD Physical Register Transfers IR E[PC] A R[rs]; B R[rt] S A + B R[rd] S; PC PC + 4 Next PC PC Inst. em IR Reg File A B Exec S em Access Reg. File

Load instruction inst LW inst LW Logical Register Transfers R[rt] E(R[rs] + sx(im6); PC PC + 4 Physical Register Transfers IR E[PC] A R[rs]; B R[rt] S A + SignEx(Im6) E[S] R[rd] ; PC PC + 4 Next PC PC Inst. em IR Reg File A B Exec S em Access Reg. File

Store instruction inst SW inst SW Logical Register Transfers E(R[rs] + sx(im6) R[rt]; PC PC + 4 Physical Register Transfers IR E[PC] A R[rs]; B R[rt] S A + SignEx(Im6); E[S] B PC PC + 4 Next PC PC Inst. em IR Reg File A B Exec S em Access Reg. File

Branch instruction inst BEQ Logical Register Transfers if R[rs] == R[rt] then PC PC + sx(im6) else PC PC + 4 inst Physical Register Transfers IR E[pc] A R[rs]; B R[rt] Eq =(A - B = = ) BEQ&Eq PC PC + sx(im6) Next PC PC Inst. em IR Reg File A B Exec S em Access Reg. File

ulticycle Implementation Each step in a multicycle implementation will take clock cycle. ulticycle implementation allows a functional unit to be used more than once per instruction, as long as it is used on different clock cycles. Using a functional unit more than once can help to reduce the amount of hardware required. The ability to allow instructions to take different number of clock cycles is another advantage of multicycle implementation.

ulticycle Datapath (Figure 5.26, p.32) PC u x Address emory emdata Write data [25-2] [2-6] [5-] Instr. [5-] register [5-] emory data register u x u x Read reg Read reg 2 Write register Write data Read data Registers Read data 2 Sign extend 6 32 A Shift left 2 B 4 u x 2 3 ALU zero ALU result ALUout Differences between single-cycle and multicycle datapath A single memory unit is used for both instruction and data. There is a single ALU, rather than an ALU and two adders. One or more registers are added after every major functional unit to hold the output of that unit until the value is used in a subsequent clock cycle.

ulticycle Datapath with Control Lines emread IorD emwrite IRWrite RegDst RegWrite ALUSrcA PC u x Address emory emdata Write data [25-2] [2-6] [5-] Instr. [5-] register [5-] emory data register u x u x Read reg Read reg 2 Write register Write data Read data Registers Read data 2 Sign extend 6 32 A Shift left 2 B 4 u x 2 3 ALU zero ALU result ALU control ALUout [5-] emtoreg ALUSrcB ALUop

Figure 5.28: Complete Datapath & Control Signals for ulticycle Implementation (including jump instruction) PCWriteCond PCWrite lord emread emwrite emtoreg IRWrite Outputs Control Op [5-] PCSource ALUOp ALUSrcB ALUSrcA RegWrite RegDst [25-] 26 28 Shift left 2 Jump address [3-] u 2 x PC u x Address emory emdata Write data [3-26] [25-2] [2-6] [5-] Instr. [5-] register [5-] emory data register u x Read reg Read reg 2 Write register Write data Read data Registers Read data 2 u x Sign extend 6 32 Shift left 2 A B 4 u x u 2 x 3 PC [3-28] ALU control ALU Zero ALU result ALUOut [5-]

Execution Steps () Fetch IR = emory[pc]; PC = PC + 4;

Execution Steps (2) Decode and Register Fetch A = Reg[IR[25..2]]; B = Reg[IR[2..6]]; ALUOut = PC + (signextend(ir[5..]) << 2);

Execution Steps (3) Execution, memory address computation or branch completion emory Reference: ALUOut = A + signextend(ir[5..]); Arithmetic/Logical Operation: ALUOut = A op B; Branch: If (A == B) PC = ALUOut; Jump: PC = PC[3..28] (IR[25..) << 2);

Execution Steps (4) emory access or R-type instruction completion emory Reference: DR = emory[aluout]; or emory[aluout] = B; Arithmetic/Logical s (R-type): Reg[IR[5..]] = ALUOut; Branch, Jump: Nothing

Execution Steps (5) emory Read completion (Load only) Reg[IR[2..6]] = DR;

Finite State achine Control for ulticycle Datapath Start emread ALUSrcA = lord = IRWrite ALUSrcB = ALUOp = PCWrite PCSource = fetch decode/ register fetch ALUSrcA = ALUSrcB = ALUOp = emory address computation Execution Branch completion 2 6 8 9 ALUSrcA = ALUSrcA = ALUSrcA = ALUSrcB = ALUSrcB = ALUSrcB = PCWrite ALUOp = ALUOp = ALUOp = PCWriteCond PCSource = PCSource = (Op = J ) Jump completion (Op = LW ) emory access emory access 3 5 7 R-type completion emread lord = emwrite lord = RegDst = RegWrite emtoreg = 4 emory read completion step RegDst = RegWrite emtoreg =

Implementation of Finite State achine Controller PCWrite Datapath control outputs ALUOp ALUSrcB ALUSrcA RegWrite RegDst Combinational control logic Outputs PCWriteCond IorD emread emwrite IRWrite emtoreg PCSource Datapath control outputs Inputs NS3 NS2 NS NS Op5 Op4 Op3 Op2 Op Op instruction register opcode field S3 S2 S S State register Next State

Logic Equation for Control Signal Outputs Output PCWrite PCWriteCond IorD emread emwrite IRWrite emtoreg PCSource PCSource ALUOp ALUOp ALUSrcB ALUSrcB ALUSrcA RegWrite RegDst Current States state + state9 state8 state3 + state5 state + state3 state5 state state4 state9 state8 state6 state8 state + state2 state + state state2 + state6 + state8 state4 + state7 state7 For Example: PCWrite = S3 S2 S S + S3 S2 S S

Logic Equation for Next State Outputs Output Current States Op NextState state4 + state5 + state7 + state8 + state9 NextState state NextState2 state (Op = lw ) + (Op = sw ) NextState3 state2 (Op = lw ) NextState4 state3 NextState5 state2 (Op = sw ) NextState6 state (Op = R-type ) NextState7 state6 NextState8 state (Op = beq ) NextState9 state (Op = jump ) For Example: NextState = State = S3 S2 S S NextState3 = State2 (Op[5-]= lw ) = S3 S2 S S Op5 Op4 Op3 Op2 Op

Performance Evaluation What is the average CPI? state diagram gives CPI for each instruction type workload gives frequency of each type Type CPI i for type Frequency CPI i x freq i Arith/Logic 4 4%.6 Load 5 3%.5 Store 4 %.4 branch 3 2%.6 Average CPI: 4.

Exceptions and Interrupts Exceptions are exceptional events that disrupt the normal flow of a program Terminology varies between different machines Examples of Interrupts User hitting the keyboard Disk drive asking for attention Arrival of a network packet Examples of Exceptions Divide by zero Overflow Invalid instruction Page fault (non-resident page in memory)

Exception Flow When an exception (or interrupt) occurs, control is transferred to the OS User program Operating System Event exception System exception handler Exception return (optional)

IPS convention Exception means any unexpected change in control flow, without distinguishing internal or external; Use the term interrupt only when the event is externally caused. Type of event From where? IPS terminology I/O device request External Interrupt Invoke OS from user program Internal Exception Arithmetic overflow Internal Exception Using an undefined instruction Internal Exception Hardware malfunctions Either Exception or Interrupt

Handling Exceptions and Interrupts When do we jump to an exception? Upon detection, invoke the OS to service the event What about in the middle of executing a multi-cycle instruction Difficult to abort the middle of an instruction Processor checks for event at the end of every instruction Processor provides EPC & Cause registers to inform OS of cause EPC a 32-bit register used to hold the address of the affected instruction. Cause a register used to record the cause of the exception. To simplify the discussion, assume undefined instruction= arithmetic overflow=

Handling Exceptions and Interrupts Status - interrupt mask and enable bits and determines what exceptions can occur. Control signals to write EPC, Cause, and Status Be able to write exception address into PC, increase mux set PC to exception address (8 8 hex ). ay have to undo PC = PC + 4, since want EPC to point to offending instruction (not its successor); PC = PC 4

Figure 5.39: The multicycle datapath with the addition needed to implement exceptions PCWriteCond PCWrite lord emread emwrite emtoreg IRWrite Outputs Control Op [5-] CauseWrite IntCause EPCWrite PCSource ALUOp ALUSrcB ALUSrcA RegWrite RegDst PC u x Address emory emdata Write data [3-26] [25-2] [2-6] [5-] Instr. [5-] register [5-] emory data register [25-] u x Read reg Read reg 2 Write register Write data Read data Registers Read data 2 u x Sign extend 6 32 Shift left 2 A B 4 u x u 2 x 3 26 28 Shift left 2 PC [3-28] ALU control ALU Zero ALU result Jump address [3-] 8 8 ALUOut u 2 x 3 EPC u x Cause [5-]

How Control Detects Exceptions Undefined detected when no next state is defined from state for the op value. We handle this exception by defining the next state value for all op values other than lw, sw, (R-type), j, and beq as new state. Shown symbolically using other to indicate that the op field does not match any of the opcodes that label arcs out of state. Arithmetic overflow included logic in the ALU to detect overflow, and a signal called Overflow is provided as an output from the ALU. This signal is used in the modified finite state machine to specify an additional possible next state. Note: Challenge in designing control of a real machine is to handle different interactions between instructions and other exceptioncausing events such that control logic remains small and fast. Complex interactions makes the control unit the most challenging aspect of hardware design

Figure 5.4: Finite state machine to handle exception detection Start emread ALUSrcA = lord = IRWrite ALUSrcB = ALUOp = PCWrite PCSource = fetch decode/ Register fetch ALUSrcA = ALUSrcB = ALUOp = emory address computation Execution Branch completion Branch completion 2 6 8 9 ALUSrcA = ALUSrcA = ALUSrcA = ALUSrcB = ALUSrcB = ALUSrcB = PCWrite ALUOp = ALUOp = ALUOp = PCWriteCond PCSource = PCSource = (Op = J ) (Op = LW ) emory emory access access R-type completion 3 5 7 IntCause = CauseWrite emread RegDst = emread ALUSrcA = lord = RegWrite Overflow lord = ALUSrcB = emtoreg = ALUOp = EPCWrite PCWrite PCSource = Write-back step Overflow 4 IntCause = CauseWrite ALUSrcA = ALUSrcB = ALUOp = EPCWrite PCWrite PCSource = RegWrite emtoreg = RegDst =

Summary Disadvantages of the Single Cycle Processor Long cycle time Cycle time is too long for all instructions except the Load ulticycle implementations have the advantage of using a different number of cycles for executing each instruction. ulticycle Processor: Divide the instructions into smaller steps Execute each step (instead of the entire instruction) in one cycle Control is specified by finite state diagram (icroprogramming is used for complex instruction set) The most widely used machine implementation is neither single cycle, nor multicycle it s the pipelined implementation (next improvement we will study).

Optional Homework In Ch.5: 2, 8,, 27, 3, 33, 36, 43