Single vs. Multi-cycle Implementation

Similar documents
EECE 417 Computer Systems Architecture

Processor: Multi- Cycle Datapath & Control

CPE 335. Basic MIPS Architecture Part II

LECTURE 6. Multi-Cycle Datapath and Control

Alternative to single cycle. Drawbacks of single cycle implementation. Multiple cycle implementation. Instruction fetch

CSE 2021 COMPUTER ORGANIZATION

CSE 2021 COMPUTER ORGANIZATION

Lecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón

Inf2C - Computer Systems Lecture 12 Processor Design Multi-Cycle

ALUOut. Registers A. I + D Memory IR. combinatorial block. combinatorial block. combinatorial block MDR

RISC Processor Design

ﻪﺘﻓﺮﺸﻴﭘ ﺮﺗﻮﻴﭙﻣﺎﻛ يرﺎﻤﻌﻣ MIPS يرﺎﻤﻌﻣ data path and ontrol control

Chapter 5 Solutions: For More Practice

ENE 334 Microprocessors

Systems Architecture I

ECE369. Chapter 5 ECE369

RISC Architecture: Multi-Cycle Implementation

Control Unit for Multiple Cycle Implementation

Topic #6. Processor Design

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization

RISC Architecture: Multi-Cycle Implementation

Points available Your marks Total 100

Processor (multi-cycle)

CC 311- Computer Architecture. The Processor - Control

The Processor: Datapath & Control

Chapter 5: The Processor: Datapath and Control

Control & Execution. Finite State Machines for Control. MIPS Execution. Comp 411. L14 Control & Execution 1

RISC Design: Multi-Cycle Implementation

Multiple Cycle Data Path

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

Lecture 8: Control COS / ELE 375. Computer Architecture and Organization. Princeton University Fall Prof. David August

Design of the MIPS Processor (contd)

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

Chapter 4 The Processor (Part 2)

Design of the MIPS Processor

Major CPU Design Steps

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

Multi-cycle Approach. Single cycle CPU. Multi-cycle CPU. Requires state elements to hold intermediate values. one clock cycle or instruction

EE457. Note: Parts of the solutions are extracted from the solutions manual accompanying the text book.

Lets Build a Processor

Multicycle Approach. Designing MIPS Processor

ECE 3056: Architecture, Concurrency and Energy of Computation. Single and Multi-Cycle Datapaths: Practice Problems

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Data Paths and Microprogramming

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

CSE 2021: Computer Organization Fall 2010 Solution to Assignment # 3: Multicycle Implementation

Computer Science 141 Computing Hardware

Multicycle conclusion

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination

Note- E~ S. \3 \S U\e. ~ ~s ~. 4. \\ o~ (fw' \i<.t. (~e., 3\0)

Mapping Control to Hardware

For More Practice FMP

Using a Hardware Description Language to Design and Simulate a Processor 5.8

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

ECE Exam I - Solutions February 19 th, :00 pm 4:25pm

Computer Architecture Chapter 5. Fall 2005 Department of Computer Science Kent State University

Initial Representation Finite State Diagram. Logic Representation Logic Equations

Beyond Pipelining. CP-226: Computer Architecture. Lecture 23 (19 April 2013) CADSL

CS232 Final Exam May 5, 2001

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

CSEN 601: Computer System Architecture Summer 2014

CS232 Final Exam May 5, 2001

CSE 2021 Computer Organization. Hugh Chesser, CSEB 1012U W10-M

Microprogramming. Microprogramming

Digital Design & Computer Architecture (E85) D. Money Harris Fall 2007

Systems Architecture

COMP303 - Computer Architecture Lecture 10. Multi-Cycle Design & Exceptions

CPE 335 Computer Organization. Basic MIPS Architecture Part I

Processor (I) - datapath & control. Hwansoo Han

Lecture 5: The Processor

Processor Implementation in VHDL. University of Ulster at Jordanstown University of Applied Sciences, Augsburg

Implementing the Control. Simple Questions

Pipelined Processor Design

ECE 30, Lab #8 Spring 2014

Pipelined Processor Design

COMP2611: Computer Organization. The Pipelined Processor

B.10 Finite State Machines B.10

CENG 3420 Lecture 06: Datapath

CSE Lecture In Class Example Handout

Very short answer questions. You must use 10 or fewer words. "True" and "False" are considered very short answers.

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

Laboratory 5 Processor Datapath

LECTURE 5. Single-Cycle Datapath and Control

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Lecture 9: Microcontrolled Multi-Cycle Implementations. Who Am I?

ECS 154B Computer Architecture II Spring 2009

Introduction to Pipelined Datapath

Review: Abstract Implementation View

CSE Computer Architecture I Fall 2009 Lecture 13 In Class Notes and Problems October 6, 2009

ECE 3056: Architecture, Concurrency, and Energy of Computation. Sample Problem Sets: Pipelining

ECE Sample Final Examination

ECE 313 Computer Organization FINAL EXAM December 11, Multicycle Processor Design 30 Points

CS3350B Computer Architecture Quiz 3 March 15, 2018

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

IMPLEMENTATION MICROPROCCESSOR MIPS IN VHDL LAZARIDIS DIMITRIS.

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

Design of Digital Circuits Lecture 13: Multi-Cycle Microarch. Prof. Onur Mutlu ETH Zurich Spring April 2017

Final Project: MIPS-like Microprocessor

October 24. Five Execution Steps

Transcription:

Single vs. Multi-cycle Implementation Multicycle: Instructions take several faster cycles For this simple version, the multi-cycle implementation could be as much as 1.27 times faster (for a typical instruction mix) Suppose we had floating point operations Floating point has very high latency E.g., floating-point multiply may be 16 ns vs integer add may be 2 ns So, clock cycle constrained by 16 ns of FP Suppose a program doesn t do ANY floating point? Performance penalty is too big to tolerate 76

Multi-cycle Implementation Single memory unit (I and D), single ALU Several temporary registers (IR, MDR, A, B, ALUOut) Temporaries hold output value of element so the output value can be used on subsequent cycle Values needed by subsequent instruction stored in programmer visible state (memory, RF) 77

A single ALU Single ALU must accomodate all inputs that used to go to three different ALUs in the single cycle implementation 1. Multiplexor on first input to ALU to select A register (from RF) or the PC 2. Multiplexor on second input to ALU to select from the constant 4 (PC increment), sign-extended value, shifted offset field, and RF input Trade-off: Additional multiplexors (and time) but only a single ALU since it can be shared across cycles 78

Multi-cycle Datapath with Control Datapath with additional muxes, temporary registers, and new control signals Most temporaries (except IR) are updated on every cycle, so no write control is required (always write) 79

Multi-cycle Steps - Instruction Fetch Instruction fetch IR = Memory[PC]; PC = PC + 4; Operation Send PC to memory as the address Read instruction from memory Write instruction into IR for use on next cycle Increment PC by 4 Uses ALU in this first cycle Set control signals to send PC and constant 4 to ALU 80

Multi-cycle Steps - Instruction Decode Don t yet know what instruction is Decode the instruction concurrently with RF read Optimistically read registers Optimistically compute branch target We ll select the right answer on next cycle Decode and Register File Read A = Reg[IR[25-21]]; B = Reg[IR[20-16]]; ALUOut = PC + (sign-extend(ir[15-0]) << 2); 81

Multi-cycle Steps - Execution Operation varies based on instruction decode Memory reference: ALUOut = A + sign-extend(ir[15-0]); Arithmetic-logical instruction: ALUOut = A op B; Branch: if (A == B) PC = ALUOut; Jump: PC = PC[31-28] (IR[25-0] << 2) 82

Multi-cycle Steps - Memory / Completion Load/store accesses memory or arithmetic writes result to the register file Memory reference: MDR = Memory[ALUOut]; (load) or Memory[ALUOut] = B; (store) Arithmetic-logical instruction: Reg[IR[15-11]] = ALUOut; 83

Multi-cycle Steps - Read completion Finish a memory read by writing read value into the register file Load operation: Reg[IR[20-16]] = MDR; 84

Multi-cycle Steps Instructions always do the first two steps Branch can finish in the third step Arithmetic-logical can finish in the fourth step Stores can finish in the fourth step Loads finish in the fifth step Instruction Number of cycles Branch / Jump 3 Arithmetic-logical 4 Stores 4 Loads 5 85

Multi-cycle vs. Single cycle? Why does it help? Let s consider a simple example... in class example 86

Example.data A:.word 10,20,30,40,50,60,70,80,90,100 B:.word 0,0,0,0,0,0,0,0,0,0.text la $s0,a # address of A li $s1,10 # A[i] * 10 li $s2,10 # iteration loop: lw $t0,0($s0) # read A[i] mul $t0,$t0,$s1 # $t0=a[i]*10 sw $t0,40($s0) # write new A[i] addi $s0,$s0,4 # next elemen t addi $s2,$s2,-1 # dec iteration bne $s2,$0,loop # done? li $v0,10 # exit sys call sysall # syscall 87

Example How much time does it take to execute this program on the single cycle and multi cycle implementation? Assume: Single cycle clock rate is 100 MHz 88

Example.data A:.word 10,20,30,40,50,60,70,80,90,100 B:.word 0,0,0,0,0,0,0,0,0,0.text # instr. count la $s0,a # address of A 2 li $s1,10 # A[i] * 10 1 li $s2,10 # iteration 1 loop: lw $t0,0($s0) # read A[i] 10 mul $t0,$t0,$s1 # $t0=a[i]*10 10 sw $t0,40($s0) # write new A[i] 10 addi $s0,$s0,4 # next elemen t 10 addi $s2,$s2,-1 # dec iteration 10 bne $s2,$0,loop # done? 10 li $v0,10 # exit sys call 1 sysall # syscall 1 89

Example How much time on SINGLE cycle? Every instruction type takes 1 clock cycle Each clock cycle is 100 MHz Clock cycle length is 1 / 100 MHz = 10ns Sum up the total number of instructions: 66 Thus, 66 instruction * 1 cycle each * 10ns per cycle = 660ns 90

Example How much time on multi cycle? To answer this, we need to know (1) the clock cycle length for the multi-cycle implementation, and (2) how many instructions of each type are executed (1) Suppose ideal circumstance: We divide the single cycle into 5 shorter (faster) cycles: Multi-cycle clock cycle = 10 ns / 5 cycle= 2 ns 91

Example (2) How many instructions of each type? Easy: Just look at the program and count them Arithmetic 36 Loads 10 Branches 10 Stores 10 Now, we need to compute how much time. The time is simply the sum of the number of each type multiplied by the number of cycles for the type, multiplied by the clock cycle time. 92

Example So, in this case, we have: 36 arithmetic * 4 cycles * 2 ns + 10 loads * 5 cycles * 2 ns + 10 branches * 3 cycles * 2 ns + 10 stores * 4 cycles * 2 ns = 288 ns + 100 ns + 60 ns + 80 ns = 528 ns Thus, multi-cycle implementation is 528 ns and the single cycle is 660 ns. 93

Example How much faster is the multi-cycle implementation? Ratio of time for single cycle to multi cycle Thus, we have: 660 ns / 528 ns = 1.25 times faster 94

Multi-cycle Instruction Exeution Branch Cycle0: Cycle1: Cycle2: IR=Memory[PC]; PC=PC+4; ALUout=PC+(sign-extend(IR[15-0])<<2); if A=B PC=ALUout; Arithmetic Cycle0: Cycle1: IR=Memory[PC]; PC=PC+4; A=Reg[IR[25-21]]; B=Reg[IR[20-16]]; Cycle2: ALUout = A op B; Cycle3: Reg[IR[15-11]]=ALUout; 96

Multi-cycle Instruction Exeution Load Cycle0: Cycle1: Cycle2: Cycle3: Cycle4: IR=Memory[PC]; PC=PC+4; A=Reg[IR[25-21]]; ALUout = A + sign-extend(ir[15-0]); MDR=Memory[ALUout]; Reg[IR[20-16]]=MDR; 97

Multi-cycle Control What are the control signals in each state for instrs: Arithmetic Load Store Branch 98

Multi-cycle Datapath with Control 99

Multi-cycle Control How are control signals generated on each cycle? What are the transitions between cycles? (i.e., what happens next?) Control signals IorD, MemRead, MemWrite, IRWrite, RegDst MemtoReg, RegWrite, ALUSrcA ALUSrcB, ALUOp PCWrite Transitions from Decode based on Opcode Transitions from Eff. Addr. happen on load/store 100

Multi-cycle Control Finite State Machine each cycle: advance one state in a state: set datapath control make decision based on opcode control is different after Decode load jump 101

Control for addition (arithmetic) STATE (CYCLE NUMBER, ADVANCE EACH CYCLE) CONTROL FETCH(1) DECODE(2) EXE ALU(3) WB ALU(4) STATE 5 IorD 0 X X X MemRead 1 0 0 0 MemWrite 0 0 0 0 IRWrite 1 0 0 0 RegDst X X X 1 MemToReg X X X 0 RegWrite 0 0 0 1 ALUSrcA 0 0 1 X ALUSrcB 01 11 00 X ALUOp 00 00 10 X PCWrite 1 0 0 0 106

Control for memory (load) STATE(CYCLE NUMBER, ADVANCE EACH CYCLE) CONTROL FETCH(1) DECODE(2) EFF AD(3) MEM RD(4) WB MEM(5) IorD 0 X X 1 X MemRead 1 0 0 1 0 MemWrite 0 0 0 0 0 IRWrite 1 0 0 0 0 RegDst X X X X 0 MemToReg X X X X 1 RegWrite 0 0 0 0 1 ALUSrcA 0 0 1 X X ALUSrcB 01 11 10 X X ALUOp 00 00 10 X X PCWrite 1 0 0 0 0 107

Control for memory (store) STATE(CYCLE NUMBER, ADVANCE EACH CYCLE) CONTROL FETCH(1) DECODE(2) EFF AD(3) MEM WR(4) IorD 0 X X 1 MemRead 1 0 0 0 MemWrite 0 0 0 1 IRWrite 1 0 0 0 RegDst X X X X MemToReg X X X X RegWrite 0 0 0 0 ALUSrcA 0 0 1 X ALUSrcB 01 11 10 X ALUOp 00 00 10 X PCWrite 1 0 0 0 108

Control for branch STATE (CYCLE NUMBER, ADVANCE EACH CYCLE) CONTROL FETCH(1) DECODE(2) COMPARE STATE 4 STATE 5 IorD 0 X x MemRead 1 0 0 MemWrite 0 0 0 IRWrite 1 0 0 RegDst X X X MemToReg X X X RegWrite 0 0 0 ALUSrcA 0 0 1 ALUSrcB 1 11 (3) 00 (0) ALUOp 00 00 (Add) 01 (Sub) PCWrite 1 0 0 109

Finite state machine (FSM) Need a way to specify control per cycle FSM: Tracks step of execution to generate control signals Implementation: Generally, hardwired or microcode 110

Traffic light control example Two states NSgreen: green light on North-South road EWgreen: green light on East-West road Sensors (inputs) in each lane to detect car NScar: a car in either the north or south bound lanes EWcar: a car in either the east or west bound lanes Control signals (outputs) to each light NSlite: 0 is red, 1 is green EWlite: 0 is red, 1 is green Current state goes for 30 seconds, then Switch to the other state if there is a car waiting Current state goes for another 30 seconds if not We use 1/30 Hz clock (Hz is clock cycles per second) I.e., determine a new state (possibly current one) every thirty seconds 111

Traffic light control example 112

Traffic light control example 113

Traffic light control example 114

Traffic light control example Let s assign 0 to NSlite and 1 to EWlite initially NextState = CurrentState EWcar + CurrentState NScar NSlite = CurrentState EWlite = CurrentState see carfsm.circ on 447 web site 115