The single cycle CPU

Similar documents
Multiple Cycle Data Path

Processor (multi-cycle)

RISC Design: Multi-Cycle Implementation

Multi-cycle Approach. Single cycle CPU. Multi-cycle CPU. Requires state elements to hold intermediate values. one clock cycle or instruction

CSE 2021 COMPUTER ORGANIZATION

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

Lecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón

Processor: Multi- Cycle Datapath & Control

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

Computer Science 141 Computing Hardware

COMP303 Computer Architecture Lecture 9. Single Cycle Control

Multicycle conclusion

Systems Architecture I

CPE 335. Basic MIPS Architecture Part II

Inf2C - Computer Systems Lecture 12 Processor Design Multi-Cycle

CSE 2021 COMPUTER ORGANIZATION

Points available Your marks Total 100

Lecture 8: Control COS / ELE 375. Computer Architecture and Organization. Princeton University Fall Prof. David August

Processor (I) - datapath & control. Hwansoo Han

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization

RISC Processor Design

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

CPE 335 Computer Organization. Basic MIPS Architecture Part I

CC 311- Computer Architecture. The Processor - Control

ECS 154B Computer Architecture II Spring 2009

Systems Architecture

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

Chapter 5: The Processor: Datapath and Control

ALUOut. Registers A. I + D Memory IR. combinatorial block. combinatorial block. combinatorial block MDR

Major CPU Design Steps

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

CS3350B Computer Architecture Quiz 3 March 15, 2018

MIPS-Lite Single-Cycle Control

Fundamentals of Computer Systems

CENG 3420 Lecture 06: Datapath

LECTURE 6. Multi-Cycle Datapath and Control

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

LECTURE 5. Single-Cycle Datapath and Control

The Processor: Datapath & Control

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

The overall datapath for RT, lw,sw beq instrucution

The Big Picture: Where are We Now? EEM 486: Computer Architecture. Lecture 3. Designing a Single Cycle Datapath

ECE369. Chapter 5 ECE369

Single Cycle CPU Design. Mehran Rezaei

ﻪﺘﻓﺮﺸﻴﭘ ﺮﺗﻮﻴﭙﻣﺎﻛ يرﺎﻤﻌﻣ MIPS يرﺎﻤﻌﻣ data path and ontrol control

RISC Architecture: Multi-Cycle Implementation

Chapter 5 Solutions: For More Practice

CSEN 601: Computer System Architecture Summer 2014

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

Digital Design & Computer Architecture (E85) D. Money Harris Fall 2007

Pipelined Processor Design

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

Design of the MIPS Processor

Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan)

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

CS359: Computer Architecture. The Processor (A) Yanyan Shen Department of Computer Science and Engineering

ECE 313 Computer Organization FINAL EXAM December 11, Multicycle Processor Design 30 Points

ENGN1640: Design of Computing Systems Topic 04: Single-Cycle Processor Design

EEM 486: Computer Architecture. Lecture 3. Designing Single Cycle Control

Topic #6. Processor Design

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

Chapter 4. The Processor. Computer Architecture and IC Design Lab

ECE170 Computer Architecture. Single Cycle Control. Review: 3b: Add & Subtract. Review: 3e: Store Operations. Review: 3d: Load Operations

Lecture 7 Pipelining. Peng Liu.

RISC Architecture: Multi-Cycle Implementation

CSE 2021 Computer Organization. Hugh Chesser, CSEB 1012U W9-W

Pipelined Processor Design

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination

ECE 313 Computer Organization EXAM 2 November 9, 2001

Review: Abstract Implementation View

Perfect Student CS 343 Final Exam May 19, 2011 Student ID: 9999 Exam ID: 9636 Instructions Use pencil, if you have one. For multiple choice

Microprogramming. Microprogramming

Multicycle Approach. Designing MIPS Processor

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

361 control.1. EECS 361 Computer Architecture Lecture 9: Designing Single Cycle Control

ENE 334 Microprocessors

CS3350B Computer Architecture Winter Lecture 5.7: Single-Cycle CPU: Datapath Control (Part 2)

Processor Design Pipelined Processor (II) Hung-Wei Tseng

The Processor: Datapath & Control

Computer Hardware Engineering

Design of the MIPS Processor (contd)

CS232 Final Exam May 5, 2001

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

CPE 335 Computer Organization. Basic MIPS Pipelining Part I

Lecture 10: Simple Data Path

EECE 417 Computer Systems Architecture

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

CSE 141 Computer Architecture Spring Lectures 11 Exceptions and Introduction to Pipelining. Announcements

Pipelined Datapath. One register file is enough

Chapter 4 The Processor 1. Chapter 4A. The Processor

Working on the Pipeline

ECE473 Computer Architecture and Organization. Processor: Combined Datapath

How to design a controller to produce signals to control the datapath

Laboratory 5 Processor Datapath

The MIPS Processor Datapath

CSE 2021: Computer Organization Fall 2010 Solution to Assignment # 3: Multicycle Implementation

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Data Paths and Microprogramming

Initial Representation Finite State Diagram. Logic Representation Logic Equations

Transcription:

The single cycle CPU [2 0] Shift Jump address [31 0] left 2 26 28 0 1 Add + [31 28] [31 26] Control RegDst Jump Branch MemRead MemtoReg Op MemWrite Src RegWrite Shift left 2 Add result M u x 1 0 M u x Read address [31 0] memory [2 21] [20 16] [1 11] 0 M u x 1 Read register 1 Write data Read data 1 Read register 2 Registers Read Write data 2 register 0 M u x 1 Zero result Address Write data Data memory Read data 1 M u x 0 [1 0] 16 32 Sign extend control [ 0]

Performance of Single-Cycle Machines Unit 2 ns and Adders 2 ns Register file (Read or Write) 1 ns Class Fetch Decode Write Ba Total R-format 2 1 2 0 1 6 LW 2 1 2 2 1 8 SW 2 1 2 2 7ns Branch 2 1 2 ns Jump 2 2ns

מה היה קורה עם cycle של השעון היה באורך משתנה נשווה לגבי תוכנית עם התערובת הבאה של פקודות: Rtype: %, LW: 2%, SW: 12% BRANCH: 18%, JUMP: 2% - מספר פקודות בתוכנית - אורך מחזור שעון I T - מספר מחזורים לפקודה = 1 CPI Execution=I*T*CPI= 8*2%+7*12%+6*%+*18%+2*2%=6.3 ns

How to save time? Idea 1: Clo with variable rate Drawbas: Complex to construct Not modular (other components in the system) Hardware components sit idle for most of the time Better Idea: Fixed clo cycle, each instruction takes a different

Multicycle Approach הרעיון מאחורי שיטת ה- Multicycle: חיסכון בזמן: כל פקודה תקח את מספר היחידות השעון הנחוצות לה. חיסכון ברכיבים: שימוש באותו רכיב בשלבים שונים של הפקודה.

שיטת הבניה של ארכיטקטורת ה- Multicycle חלק את הפקודה לשלבים. כל שלב cycle: - אזן את כמות העבודה הנדרשת בכל שלב. - הקטן את כמות העבודה הנדרשת בכל שלב - כל שלב יבצע רק פעולה אחת פונקצינאלית. בסיום כל מחזור שעון: - שמור את הערכים עבור השלבים הבאים. - הוסף לביצוע משימה זו רגיסטרים פנימיים נוספים.

Timing of a lw instruction in a single cycle CPU I.Mem data 0x00000 output Rs, Rt inputs We want to replace a long single CK cycle with short ones: Timing of a lw instruction in a multi-cycle CPU D.Mem adrs D. Mem data A,B out Mem data MDR fetch 2ns 0x00000 fetch decode decode execute execute memory output (address) memory Mem data Write ba 1ns 2ns 2ns 1ns 0 1 2 3 =(0) in calculates something Write ba

Therefore we should add registers to the single cycle CPU shown below: Adder Reg File [2:21]=Rs [20:16]=Rt Data Address D. Out Rd D.In [1:0] 16 Sext 16->32

Adding registers to split the instruction to stages: Adder [2:21]=Rs [20:16]=Rt Reg File A 2 out Data Address D. Out MDR Write 0 1 Rd B 3 D.In [1:0] 16 Sext 16->32

A multi--cycle CPU capable of R-type & lw/sw & branch instructions & data [2:21]=Rs [20:16]=Rt Reg File A out Rd B [1:0] 16 Sext 16->32 << 2 <<2

Let us explain the multi-cycle CPU First we ll look at a CPU capable of performing only R- type instructions Then, we ll add the lw instruction And the sw instruction Then, the beq instruction And finally, the j instruction

0x00000 0x0000 output New output Adder Rs, Rt inputs new inputs fetch decode execute output Write ba New output [31:26] 6 [2:21]=Rs [20:16]=Rt Reg File [1:11]=Rd [:0]=funct 6 Let us remind ourselves how works a single cycle CPU capable of performing R-type instructions. Here you see the data-path and the timing of an R-type instruction.

A single cycle CPU demo: R-type instruction [2:21]=Rs Reg File [20:16]=Rt [1:11]=Rd

A multi cycle CPU capable of performing R-type instructions & data [2:21]=Rs [20:16]=Rt Reg File A out Rd B

A multi cycle CPU capable of R-type & instructions fetch & data [2:21]=Rs [20:16]=Rt Reg File A out 0 1 Rd B

A multi cycle CPU capable of R-type & instructions decode & data [2:21]=Rs [20:16]=Rt Reg File A out 1 Rd B 2

A multi cycle CPU capable of R-type & instructions execute & data [2:21]=Rs [20:16]=Rt Reg File A out Rd B 3 2

A multi cycle CPU capable of R-type & instructions write ba & data [2:21]=Rs [20:16]=Rt Reg File A out Rd Rd B 3

Timing of an R-type instruction in a single cycle CPU Inst. Mem data Rs, Rt GPR input 0x00000 output = the instruction inputs output (Data = result of cala.) fetch decode execute Write Ba 0 1 2 3 (=0) Timing of an R-type instruction in a multi-cycle CPU Mem data A,B fetch Previous inst. Current instruction decode execute out Write ba

fetch Mem data Current instruction =M ( ) Previous inst. Current instruction next inst. GPR outputs A,B decode A= Rs, B= Rt output execute uot= A op B out Write ba Rd = out R-Type instruction takes CKs At the rising edge of CK: Rd=out Write The state diagram: =M() A= Rs, B= Rt out = A op B Rd=out

A multi-cycle CPU capable of R-type instructions ( calc. ) & data [2:21]=Rs [20:16]=Rt Reg File A out Rd B

fetch current next = current + Mem data Previous inst. current instruction next inst. GPR outputs decode A,B output execute Write ba out = + uot = A op B At the rising edge of CK: Rd=out Write

A multi cycle CPU capable of R-type & instructions fetch [2:21]=Rs [20:16]=Rt Reg File A out Rd B

The state diagram of a CPU capable of R-type instructions only Fetch 0 =M() = + Decode 1 A=Rs B=Rt R-type 6 out=a op B WBR 7 Rd = out

Fetch 0 The state diagram of a CPU capable of R-type and lw instructions out= A+sext(imm) AdrCmp 2 lw lw Decode 1 R-type 6 Load 3 MDR = M(out) WB Rt = MDR WBR 7

We added registers to split the instruction to stages. Let s discuss the lw instruction Adder [2:21]=Rs [20:16]=Rt Reg File A 2 out Data Address D. Out MDR Write 0 1 Rd B 3 D.In [1:0] 16 Sext 16->32

First we draw a multi-cycle CPU capable of R-type & lw instructions: [2:21]=Rs [20:16]=Rt Reg File A out Rd B Data MDR [1:0] 16 Sext 16->32 We just moved the data memory All parts related to lw only are blue

A multi-cycle CPU capable of R-type & lw instructions fetch [2:21]=Rs [20:16]=Rt Reg File A out Rd B Data MDR [1:0] 16 Sext 16->32

A multi-cycle CPU capable of R-type & lw instructions decode [2:21]=Rs [20:16]=Rt Reg File A out Rd B Data MDE [1:0] 16 Sext 16->32 << 2

A multi-cycle CPU capable of R-type & lw instructions AdrCmp [2:21]=Rs [20:16]=Rt Reg File A out Rd B Data MDR [1:0] 16 Sext 16->32

A multi-cycle CPU capable of R-type & lw instructions memory Branch Address [2:21]=Rs [20:16]=Rt Reg File A out Rd B Data MDR [1:0] 16 Sext 16->32 << 2

A multi-cycle CPU capable of R-type & lw instructions WB [2:21]=Rs [20:16]=Rt Rd Reg File Rt A B out Data MDR [1:0] 16 Sext 16->32

Can we unite the & Data memories? (They are not used simultaneously as in the single cycle CPU) [2:21]=Rs [20:16]=Rt Reg File A out Rd B Data MDR [1:0] 16 Sext 16->32

So here is a multi-cycle CPU capable of R-type & lw instructions using a single memory for instructions & data & data [2:21]=Rs [20:16]=Rt Reg File A out Rd B MDR [1:0] 16 Sext 16->32

0x00000 Timing of a lw instruction in a single cycle CPU I.Mem data Rs, Rt D.Mem adrs D. Mem data fetch decode execute output inputs memory output (address) Mem data Write ba Timing of a lw instruction in a multi-cycle CPU A,B out fetch Previous inst. + current instruction decode execute Data address Mem data MDR memory Data to Rt Write ba

fetch Mem data =M ( ) = + Previous inst. current instruction GPR outputs A,B decode A= Rs, B= Rt output out execute Data address Data address uot= A+sext(imm) Mem data memory MDR=M(out) MDR Write ba Data to Rt Write, Write At the rising edge of CK: Rt=MDR

Fetch 0 =M() = + The state diagram of a CPU capable of R-type and lw instructions lw Decode 1 R-type A=Rs B=Rt out= A+sext(imm) AdrCmp 2 6 out=a op B Load 3 MDR = M(out) WB Rt = MDR WBR Rd = out 7

A multi-cycle CPU capable of R-type & lw & sw instructions Branch Address & data [2:21]=Rs [20:16]=Rt Reg File A out Rd B MDR [1:0] 16 Sext 16->32 << 2 lw sw

Fetch 0 =M() = + The state diagram of a CPU capable of R-type and lw and sw instructions lw+sw out= A+sext(imm) AdrCmp 2 lw sw Decode 1 R-type 6 A=Rs B=Rt out=a op B MDR = M(out) Load 3 Store M(out)=B Rt = MDR WB WBR 7 Rd = out

A multi-cycle CPU capable of R-type & lw/sw & branch instructions & data [2:21]=Rs [20:16]=Rt Reg File A out Rd B [1:0] 16 Sext 16->32 << 2 <<2

Fetch 0 Adding the instruction beq to the state diagram: lw+sw Decode 1 R-type beq lw Load 3 AdrCmp Branch 2 8 6 sw Store not zero Calc Rs -Rt (just to produce the zero signal) zero Calc =+sext(imm)<<2 WB WBR 7

Calc out=+sext(imm)<<2 lw+sw Fetch 0 Decode 1 R-type beq Adding the instruction beq to the state diagram, a more efficient way: Let s use the decode state in which the is doing nothing to compute the branch address. We ll have to store it for 1 more CK cycle, until we know whether to branch or not! (We store it in the out reg.) AdrCmp Branch 2 8 6 lw sw Calc Rs - Rt. If zero, load the with out data, else do not load the Load 3 Store WB WBR 7

A multi-cycle CPU capable of R-type & lw/sw & branch instructions + & data [2:21]=Rs [20:16]=Rt Reg File A out Rd B [1:0] 16 Sext 16->32 <<2 Branch Address

Fetch 0 Adding the instruction j to the state diagram: Decode lw+sw 1 R-type beq j AdrCmp Branch 2 8 lw sw 6 Jump 9 = [31:28] [2:0]<<2 Load 3 Store WB WBR 7

A multi-cycle CPU capable of R-type & lw/sw & branch & jump instructions += next address Jump address <<2+ [2:0] [31:28] & data [2:21]=Rs [20:16]=Rt Reg File A out Rd B [1:0] 16 Sext 16->32 <<2 Branch Address

סיכום שלבי הפקודות השונות Step name fetch decode/register fetch Action for R-type instructions Action for memoryreference instructions branches Action for = [] = + A = Reg [[2-21]] B = Reg [[20-16]] Out = + (sign-extend ([1-0]) << 2) Action for jumps Execution, address Out = A op B Out = A + sign-extend if (A ==B) then = [31-28] II computation, branch/ ([1-0]) = Out ([2-0]<<2) jump completion 6 2 8 9 access or R-type Reg [[1-11]] = Load: MDR = [Out] completion Out 3 or 7 Store: [Out] = B read completion 0 1 Load: Reg[[20-16]] = MDR

MultiCycle implementation with Control 0 M u x 1 Address Write data MemData [31-26] [2 21] [20 16] [1 0] register [1 0] data register WriteCond Write IorD Outputs MemRead MemWrite MemtoReg Write [2 0] Control Op [ 0] 0 M u [1 11] x 1 0 M u x 1 Source Op SrcB 16 SrcA RegWrite RegDst Read register 1 Read register 2 Registers Write register Write data Sign extend Read data 1 Read data 2 32 Shift left 2 A B 0 M u x 1 0 1 M u 2 x 3 26 28 Shift left 2 control [31-28] Zero result Jump address [31-0] Out 0 1 2 M u x [ 0]

(Op = 'LW') (Op = 'J') Final State Machine 2 address computation SrcA = 1 SrcB = 10 Op = 00 Start fetch 0 MemRead SrcA = 0 IorD = 0 Write SrcB = 01 Op = 00 Write Source = 00 6 (Op = 'LW') or (Op = 'SW') Execution SrcA =1 SrcB = 00 Op= 10 8 (Op = R-type) Branch completion SrcA = 1 SrcB = 00 Op = 01 WriteCond Source = 01 decode/ register fetch 1 (Op = 'BEQ') 9 SrcA = 0 SrcB = 11 Op = 00 Jump completion Write Source = 10 3 access (Op = 'SW') access 7 R-type completion MemRead IorD = 1 MemWrite IorD = 1 RegDst = 1 RegWrite MemtoReg = 0 Write-ba step RegDst = 0 RegWrite MemtoReg =1

Fetch 0 The final state diagram: Decode lw+sw 1 R-type beq j AdrCmp Branch 2 8 lw sw 6 Jump 9 Load 3 Store WB WBR 7

End of multi-cycle implementation