Single Cycle Datapath

Similar documents
Single Cycle Datapath

Processor (I) - datapath & control. Hwansoo Han

Chapter 4. The Processor

Chapter 4. The Processor

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

ECE232: Hardware Organization and Design

Systems Architecture

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

Chapter 4. The Processor Designing the datapath

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Chapter 4. The Processor. Computer Architecture and IC Design Lab

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Inf2C - Computer Systems Lecture Processor Design Single Cycle

Chapter 4. The Processor

The MIPS Processor Datapath

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

ECE260: Fundamentals of Computer Engineering

TDT4255 Computer Design. Lecture 4. Magnus Jahre. TDT4255 Computer Design

Chapter 4. The Processor

Introduction. Chapter 4. Instruction Execution. CPU Overview. University of the District of Columbia 30 September, Chapter 4 The Processor 1

Introduction. Datapath Basics

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

ECE369. Chapter 5 ECE369

CPU Organization (Design)

COMPUTER ORGANIZATION AND DESIGN

Chapter 5: The Processor: Datapath and Control

Systems Architecture I

Chapter 4 The Processor 1. Chapter 4A. The Processor

Ch 5: Designing a Single Cycle Datapath

ﻪﺘﻓﺮﺸﻴﭘ ﺮﺗﻮﻴﭙﻣﺎﻛ يرﺎﻤﻌﻣ MIPS يرﺎﻤﻌﻣ data path and ontrol control

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

The Processor: Datapath & Control

Topic #6. Processor Design

Review: Abstract Implementation View

Single Cycle Data Path

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

CC 311- Computer Architecture. The Processor - Control

Lecture 5: The Processor

CENG 3420 Lecture 06: Datapath

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

COMPUTER ORGANIZATION AND DESIGN

ENE 334 Microprocessors

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

LECTURE 5. Single-Cycle Datapath and Control

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

CS Computer Architecture Spring Week 10: Chapter

Laboratory 5 Processor Datapath

CS 61C: Great Ideas in Computer Architecture Datapath. Instructors: John Wawrzynek & Vladimir Stojanovic

Lets Build a Processor

The Processor: Datapath & Control

Single Cycle CPU Design. Mehran Rezaei

ENGN1640: Design of Computing Systems Topic 04: Single-Cycle Processor Design

RISC Processor Design

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

CPE 335 Computer Organization. Basic MIPS Architecture Part I

CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Major CPU Design Steps

361 datapath.1. Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath

The Big Picture: Where are We Now? EEM 486: Computer Architecture. Lecture 3. Designing a Single Cycle Datapath

CS3350B Computer Architecture Quiz 3 March 15, 2018

CS61C : Machine Structures

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

Review. N-bit adder-subtractor done using N 1- bit adders with XOR gates on input. Lecture #19 Designing a Single-Cycle CPU

Data paths for MIPS instructions

Computer and Information Sciences College / Computer Science Department The Processor: Datapath and Control

COMPUTER ORGANIZATION AND DESIGN

Lecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón

ECE 486/586. Computer Architecture. Lecture # 7

CS222: Processor Design

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

The overall datapath for RT, lw,sw beq instrucution

Learning Outcomes. Spiral 3-3. Sorting: Software Implementation REVIEW

CSE140: Components and Design Techniques for Digital Systems

Computer Science 61C Spring Friedland and Weaver. The MIPS Datapath

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Data Paths and Microprogramming

CSEN 601: Computer System Architecture Summer 2014

CS 61C: Great Ideas in Computer Architecture. MIPS CPU Datapath, Control Introduction

LECTURE 3: THE PROCESSOR

Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan)

ECE170 Computer Architecture. Single Cycle Control. Review: 3b: Add & Subtract. Review: 3e: Store Operations. Review: 3d: Load Operations

ECE468 Computer Organization and Architecture. Designing a Single Cycle Datapath

COMP303 Computer Architecture Lecture 9. Single Cycle Control

Outline. EEL-4713 Computer Architecture Designing a Single Cycle Datapath

Computer Hardware Engineering

Lecture 12: Single-Cycle Control Unit. Spring 2018 Jason Tang

Lecture 8: Control COS / ELE 375. Computer Architecture and Organization. Princeton University Fall Prof. David August

Multicycle Approach. Designing MIPS Processor

CSE 378 Midterm 2/12/10 Sample Solution

CS61C : Machine Structures

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

Adding Support for jal to Single Cycle Datapath (For More Practice Exercise 5.20)

CPU Design Steps. EECC550 - Shaaban

Transcription:

Single Cycle atapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili Section 4.1-4.4 Appendices B.3, B.7, B.8, B.11,.2 ing Note: Appendices A-E in the hardcopy text correspond to chapters 7-11 in the online text. Practice Problems: 1, 4, 6, 9 (2) 1

Introduction We will examine two MIPS implementations A simplified ersion à this module A more realistic pipelined ersion Simple subset, shows most aspects Memory reference: lw, sw Arithmetic/logical: add, sub, and, or, slt Control transfer: beq, j (3) Instruction Execution PC instruction memory, fetch instruction Register numbers register file, read registers epending on instruction class 1. Use ALU to calculate o o Arithmetic result Memory address for load/store o Branch target address 2. Access data memory for load/store 3. PC An address or PC + 4 Address An Encoded Program 8d0b0000 014b5020 21080004 2129ffff 1520fffc 000a082a.... (4) 2

Basic Ingredients Include the functional units we need for each instruction combinational and sequential Instruction address Instruction memory Register numbers Instruction PC a. Instruction memory b. Program counter ata 5 3 register 1 5 5 register 2 Registers register data data 1 data 2 Reg ata Add Sum c. Adder ALU control Zero ALU ALU result Address data Mem ata memory data Mem a. ata memory unit 16 32 Sign extend b. Sign-extension unit a. Registers b. ALU (5) Sequential Elements (4.2, B.7, B.11) Register: stores data in a circuit Uses a clock signal to determine when to update the stored alue Edge-triggered: update when Clk changes from 0 to 1 Q falling edge rising edge Clk Clk Q latch C Q latch _ C Q Q _ Q Q c C (6) 3

Register with write control Sequential Elements Only updates on clock edge when write control input is 1 Used when stored alue is required later Clk Q Clk cycle time Q latch C Q latch _ C Q Q _ Q Q c C (7) Clocking Methodology Combinational logic transforms data during clock cycles Between clock edges Input from state elements, output to state element Longest delay determines clock period Synchronous s. Asynchronous operation Recall: Critical Path elay (8) 4

Register File (B.8) Built using flip-flops (remember ECE 2020!) register number 1 register number 2 Register 0 Register 1 Register n 1 Register n M u x M u x data 1 data 2 register number 1 register number 2 Register file register data data 1 data 2 (9) Register File Note: we still use the real clock to determine when to write Register number 0 1 n-to-1 decoder n 1 n C Register 0 C Register 1 C Register n 1 register number 1 register number 2 Register file register data data 1 data 2 Register data C Register n (10) 5

atapath Building a atapath (4.3) Elements that process data and addresses in the CPU o Registers, ALUs, mux s, memories, We will build a MIPS datapath incrementally Refining the oeriew design (11) High Leel escription Control Fetch Instructions Execute Instructions Memory Operations Single instruction single data stream model of execution Serial execution model Commonly known as the on Neumann execution model Stored program model Instructions and data share memory (12) 6

Instruction Fetch clk 32-bit register Increment by 4 for next instruction Start instruction fetch cycle time Complete instruction fetch clk (13) R-Format Instructions two register operands Perform arithmetic/logical operation register result op rs rt rd shamt funct (14) 7

Executing R-Format Instructions 5 5 5 register 1 register 2 register data data 1 data 2 Reg 3 ALU control ALU Zero ALU result op rs rt rd shamt funct (15) Load/Store Instructions register operands Calculate address using 16-bit offset Use ALU, but sign-extend offset Load: memory and update register Store: register alue to memory op rs rt 16-bit constant (16) 8

Executing I-Format Instructions register 1 register 2 register Reg 16 32 S ign exte nd A d d r e s s W r i te d a ta M e m W r it e a ta m e m o r y R e a d d a ta M e m R e a d op rs rt 16-bit constant (17) register operands Compare operands Branch Instructions Use ALU, subtract and check Zero output Calculate target address Sign-extend displacement Shift left 2 places (word displacement) Add to PC + 4 o Already calculated by instruction fetch op rs rt 16-bit constant (18) 9

Branch Instructions Just re-routes wires Sign-bit wire replicated op rs rt 16-bit constant (19) Updating the Program Counter Branch Add M ux 0 4 Shift Add result ALU 1 Computation of the branch address PC address Instruction [31 0] Instruction memory Instruction [25 21] Instruction [20 16] Instruction [15 11 Instruction [15 0] 16 Sign 32 extend loop: beq $t0, $0, exit addi $t0, $t0, -1 lw $a0, arg1($t1) lw $a1, arg2($t2) jal func add $t3, $t3, $0 addi $t1, $t1, 4 addi $t2, $t2, 4 j loop (20) 10

Composing the Elements First-cut data path does an instruction in one clock cycle Each datapath element can only do one function at a time Hence, we need separate instruction and data memories Use multiplexers where alternate data sources are used for different instructions PC Address An Encoded Program 014b5020 21080004 2129ffff 1520fffc 000a082a.... (21) Full Single Cycle atapath estination register is instructionspecific lw$t0, 0($t4) s. add $t0m $t1, $t2 (22) 11

ALU used for ALU Control (4.4,.2) Load/Store: Function = add Branch: Function = subtract R-type: Function depends on func field ALU control Function 000 AN 001 OR 010 add 110 subtract 111 set-on-less-than (23) ALU Control Assume 2-bit ALUOp deried from opcode Combinational logic deries ALU control don t care opcode ALUOp Operation funct ALU function ALU control lw 00 load word XXXXXX add 010 sw 00 store word XXXXXX add 010 beq 01 branch equal XXXXXX subtract 110 R-type 10 add 100000 add 010 subtract 100010 subtract 110 AN 100100 AN 000 OR 100101 OR 001 set-on-less-than 101010 set-on-less-than 111 How do we turn this description into gates? (24) 12

ALU Controller lw/sw beq arith ALUOp Funct field ALU ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0 Control 0 0 X X X X X X 010 X 1 X X X X X X 110 1 X X X 0 0 0 0 010 1 X X X 0 0 1 0 110 1 X X X 0 1 0 0 000 1 X X X 0 1 0 1 001 1 X X X 1 0 1 0 111 Generated from ecoding inst[31:26] inst[5:0] add sub add sub and or slt ALUOp ALU control 3 A L U A LU co ntrol Z e ro A L U re su lt funct = inst[5:0] (25) ALU Control Simple combinational logic (truth tables) ALUOp ALU control block ALUOp0 ALUOp1 F3 Operation2 F (5 0) F2 F1 F0 Operation1 Operation0 Operation (26) 13

The Main Control Unit Control signals deried from instruction R-type Load/ Store Branch 0 rs rt rd shamt funct 31:26 25:21 20:16 15:11 10:6 5:0 35 or 43 rs rt address 31:26 25:21 20:16 15:0 4 rs rt address 31:26 25:21 20:16 15:0 opcode always read read, except for load write for R-type and load sign-extend and add (27) atapath With Control Use rt not rd Instruction Regst ALUSrc Memto- Reg Reg Mem Mem Branch ALUOp1 ALUp0 R-format 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 (28) 14

Commodity Processors ARM 7 Single Cycle atapath (29) Control Unit Signals Inputs Op5 Op4 Op3 Op2 Op1 Op0 Inst[31:26] Memto- Reg Instruction Regst ALUSrc Reg Mem Mem Branch ALUOp1 ALUp0 R-format 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 Adding a new instruction? R-format Iw sw beq Outputs Regst ALUSrc MemtoReg Reg Mem Mem Branch ALUOp1 ALUOpO To harness the datapath Programmable logic array (PLA) implementation (B.3) (30) 15

Controller Implementation LIBRARY IEEE; USE IEEE.ST_LOGIC_1164.ALL; USE IEEE.ST_LOGIC_ARITH.ALL; USE IEEE.ST_LOGIC_SIGNE.ALL; ENTITY control IS PORT( SIGNAL Opcode : IN ST_LOGIC_VECTOR( 5 OWNTO 0 ); SIGNAL Regst : OUT ST_LOGIC; SIGNAL ALUSrc : OUT ST_LOGIC; SIGNAL MemtoReg : OUT ST_LOGIC; SIGNAL Reg : OUT ST_LOGIC; SIGNAL Mem : OUT ST_LOGIC; SIGNAL Mem : OUT ST_LOGIC; SIGNAL Branch : OUT ST_LOGIC; SIGNAL ALUop : OUT ST_LOGIC_VECTOR( 1 OWNTO 0 ); SIGNAL clock, reset : IN ST_LOGIC ); EN control; (31) Controller Implementation (cont.) ARCHITECTURE behaior OF control IS SIGNAL R_format, Lw, Sw, Beq : ST_LOGIC; BEGIN -- Code to generate control signals using opcode bits R_format <= '1' WHEN Opcode = "000000" ELSE '0'; Lw <= '1' WHEN Opcode = "100011" ELSE '0'; Sw <= '1' WHEN Opcode = "101011" ELSE '0'; Beq <= '1' WHEN Opcode = "000100" ELSE '0'; Regst <= R_format; ALUSrc <= Lw OR Sw; Implementation MemtoReg <= Lw; Reg <= R_format OR Lw; of each table column Mem <= Lw; Mem <= Sw; Branch <= Beq; ALUOp( 1 ) <= R_format; ALUOp( 0 ) <= Beq; EN behaior; Instruction Regst ALUSrc Memto- Reg Reg Mem Mem Branch ALUOp1 ALUp0 R-format 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 (32) 16

R-Type Instruction (33) Load Instruction (34) 17

Branch-on-Equal Instruction (35) Implementing Jumps Jump Jump uses word address Update PC with concatenation of Top 4 bits of old PC 26-bit jump address 00 2 address 31:26 25: 0 Need an extra control signal decoded from opcode (36) 18

atapath With Jumps Added clk (37) Example: ARM Cortex M3 Fitbit Flex ARM Processor www.ifixit.com Blue Tooth IC zembedded.com (38) 19

Our Simple Control Structure All of the logic is combinational We wait for eerything to settle down, and the right thing to be done ALU might not produce right answer right away we use write signals along with clock to determine when to write Cycle time determined by length of the longest path State element 1 Combinational logic State element 2 Clock cycle We are ignoring some details like setup and hold times (39) Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not feasible to ary period for different instructions Violates design principle Making the common case fast We will improe performance by pipelining (40) 20

Summary Single cycle datapath All instructions execute in one clock cycle Not all instructions take the same amount of time Software sees a simple interface Can memory operations really take one cycle? Improe performance ia pipelining, multicycle operation, parallelism or customization We will address these next (41) Study Guide Gien an instruction, be able to specify the alues of all control signals required to execute that instruction Add new instructions: modify the datapath and control to affect its execution Modify the dataflow in support, e.g., jal, jr, shift, etc. Modify the VHL controller Gien delays of arious components, determine the cycle time of the datapath istinguish between those parts of the datapath that are unique to each instruction and those components that are shared across all instructions (42) 21

Study Guide (cont.) Gien a set of control signal alues determine what operation the datapath performs Know the bit width of each signal in the datapath Add support for procedure calls jal instruction (43) Glossary Asynchronous Clock Controller Critical path Cycle Time ataflow Flip Flop Program Counter Register File Sign Extension Synchronous (44) 22