Single Cycle Datapath

Similar documents
Single Cycle Datapath

Processor (I) - datapath & control. Hwansoo Han

Chapter 4. The Processor

Chapter 4. The Processor

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

Systems Architecture

ECE232: Hardware Organization and Design

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

Chapter 4. The Processor Designing the datapath

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Chapter 4. The Processor. Computer Architecture and IC Design Lab

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. The Processor

Inf2C - Computer Systems Lecture Processor Design Single Cycle

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

The MIPS Processor Datapath

Chapter 4. The Processor

ECE260: Fundamentals of Computer Engineering

TDT4255 Computer Design. Lecture 4. Magnus Jahre. TDT4255 Computer Design

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

COMPUTER ORGANIZATION AND DESIGN

Single Cycle Data Path

Introduction. Datapath Basics

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

Ch 5: Designing a Single Cycle Datapath

Chapter 4 The Processor 1. Chapter 4A. The Processor

CPU Organization (Design)

Topic #6. Processor Design

ﻪﺘﻓﺮﺸﻴﭘ ﺮﺗﻮﻴﭙﻣﺎﻛ يرﺎﻤﻌﻣ MIPS يرﺎﻤﻌﻣ data path and ontrol control

Introduction. Chapter 4. Instruction Execution. CPU Overview. University of the District of Columbia 30 September, Chapter 4 The Processor 1

The Processor: Datapath & Control

ECE369. Chapter 5 ECE369

Review: Abstract Implementation View

Chapter 5: The Processor: Datapath and Control

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Systems Architecture I

COMPUTER ORGANIZATION AND DESIGN

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

CENG 3420 Lecture 06: Datapath

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

CC 311- Computer Architecture. The Processor - Control

CS 61C: Great Ideas in Computer Architecture Datapath. Instructors: John Wawrzynek & Vladimir Stojanovic

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

Lecture 5: The Processor

CS Computer Architecture Spring Week 10: Chapter

ENE 334 Microprocessors

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

LECTURE 5. Single-Cycle Datapath and Control

CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath

Lets Build a Processor

361 datapath.1. Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath

RISC Processor Design

The Processor: Datapath & Control

Laboratory 5 Processor Datapath

ENGN1640: Design of Computing Systems Topic 04: Single-Cycle Processor Design

COMPUTER ORGANIZATION AND DESIGN

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Major CPU Design Steps

CS222: Processor Design

CSEN 601: Computer System Architecture Summer 2014

CSE140: Components and Design Techniques for Digital Systems

CS61C : Machine Structures

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

The Big Picture: Where are We Now? EEM 486: Computer Architecture. Lecture 3. Designing a Single Cycle Datapath

Review. N-bit adder-subtractor done using N 1- bit adders with XOR gates on input. Lecture #19 Designing a Single-Cycle CPU

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Data Paths and Microprogramming

CS3350B Computer Architecture Quiz 3 March 15, 2018

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

ECE 486/586. Computer Architecture. Lecture # 7

CS 61C: Great Ideas in Computer Architecture. MIPS CPU Datapath, Control Introduction

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization

ECE468 Computer Organization and Architecture. Designing a Single Cycle Datapath

CPE 335 Computer Organization. Basic MIPS Architecture Part I

Single Cycle CPU Design. Mehran Rezaei

The overall datapath for RT, lw,sw beq instrucution

Computer Hardware Engineering

Computer Science 61C Spring Friedland and Weaver. The MIPS Datapath

Learning Outcomes. Spiral 3-3. Sorting: Software Implementation REVIEW

LECTURE 3: THE PROCESSOR

Outline. EEL-4713 Computer Architecture Designing a Single Cycle Datapath

Lecture 8: Control COS / ELE 375. Computer Architecture and Organization. Princeton University Fall Prof. David August

Lecture 10: Simple Data Path

CSE 141 Computer Architecture Summer Session Lecture 3 ALU Part 2 Single Cycle CPU Part 1. Pramod V. Argade

Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan)

Computer and Information Sciences College / Computer Science Department The Processor: Datapath and Control

Data paths for MIPS instructions

Computer Hardware Engineering

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

ECE170 Computer Architecture. Single Cycle Control. Review: 3b: Add & Subtract. Review: 3e: Store Operations. Review: 3d: Load Operations

Adding Support for jal to Single Cycle Datapath (For More Practice Exercise 5.20)

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

Transcription:

Single Cycle atapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili Section 4.-4.4 Appendices B.7, B.8, B.,.2 Practice Problems:, 4, 6, 9 ing (2)

Introduction We will examine two MIPS implementations v A simplified version à this module v A more realistic pipelined version Simple subset, shows most aspects v Memory reference: lw, sw v Arithmetic/logical: add, sub, and, or, slt v Control transfer: beq, j (3) Instruction Execution PC instruction memory, fetch instruction Register numbers register file, read registers epending on instruction class. Use ALU to calculate o Arithmetic result o Memory address for load/store o Branch target address 2. Access memory for load/store 3. PC An address or PC + 4 Address An Encoded Program 8db 4b52 284 229ffff 52fffc a82a.... (4) 2

Basic Ingredients Include the functional units we need for each instruction combinational and sequential Instruction address Instruction memory Register numbers Instruction PC a. Instruction memory b. Program counter ata 5 3 register 5 5 register 2 Registers register 2 Reg ata Add Sum c. Adder ALU control Zero ALU ALU result Address Mem ata memory Mem a. ata memory unit 6 32 Sign extend b. Sign-extension unit a. Registers b. ALU (5) Sequential Elements (4.2, B.7, B.) Register: stores in a circuit v Uses a clock signal to determine when to update the stored value v Edge-triggered: update when Clk changes from to Q falling edge rising edge Clk Clk Q latch C Q latch _ C Q Q _ Q Q C (6) 3

Sequential Elements Register with write control v Only updates on clock edge when write control input is v Used when stored value is required later Clk Q Clk cycle time Q latch C Q latch _ C Q Q _ Q Q C (7) Clocking Methodology Combinational logic transforms during clock cycles v Between clock edges v Input from state elements, output to state element v Longest delay determines clock period Synchronous vs. Asynchronous operation Recall: Critical Path elay (8) 4

Register File (B.8) Built using flip-flops (remember ECE 23!) register number register number 2 Register Register Register n Register n M u x M u x 2 register number register number 2 Register file register 2 (9) Register File Note: we still use the real clock to determine when to write Register number n-to- decoder n n C Register C Register C Register n register number register number 2 Register file register 2 Register C Register n () 5

Building a atapath (4.3) atapath v Elements that process and addresses in the CPU o Registers, ALUs, mux s, memories, We will build a MIPS path incrementally v Refining the overview design () High Level escription Control Fetch Instructions Execute Instructions Memory Operations Single instruction single stream model of execution v Serial execution model Commonly known as the von Neumann execution model v v Stored program model Instructions and share memory (2) 6

Instruction Fetch clk 32-bit register Increment by 4 for next instruction Start instruction fetch cycle time Complete instruction fetch clk (3) two register operands R-Format Instructions Perform arithmetic/logical operation register result op rs rt rd shamt funct (4) 7

Executing R-Format Instructions 5 5 5 register register 2 register 2 Reg 3 ALU control ALU Zero ALU result op rs rt rd shamt funct (5) Load/Store Instructions register operands Calculate address using 6-bit offset v Use ALU, but sign-extend offset Load: memory and update register Store: register value to memory op rs rt 6-bit constant (6) 8

Executing I-Format Instructions register register 2 register Reg A d d r e s s W r i te d a ta M e m W r it e R e a d d a ta a ta m e m o r y 6 32 S ign exte nd M e m R e a d op rs rt 6-bit constant (7) Branch Instructions register operands Compare operands v Use ALU, subtract and check Zero output Calculate target address v Sign-extend displacement v Shift left 2 places (word displacement) v Add to PC + 4 o Already calculated by instruction fetch op rs rt 6-bit constant (8) 9

Branch Instructions Just re-routes wires Sign-bit wire replicated op rs rt 6-bit constant (9) Updating the Program Counter Branch Add 4 Shift Add result ALU Computation of the branch address PC address Instruction [3 ] Instruction memory Instruction [25 2] Instruction [2 6] Instruction [5 Instruction [5 ] 6 Sign 32 extend loop: beq $t, $, exit addi $t, $t, - lw $a, arg($t) lw $a, arg2($t2) jal func add $t3, $t3, $v addi $t, $t, 4 addi $t2, $t2, 4 j loop (2)

Composing the Elements First-cut path does an instruction in one clock cycle v Each path element can only do one function at a time v Hence, we need separate instruction and memories Use multiplexers where alternate sources are used for different instructions PC Address An Encoded Program 4b52 284 229ffff 52fffc a82a.... (2) Full Single Cycle atapath estination register is instructionspecific lw$t, ($t4) vs. add $tm $t, $t2 (22)

The Main Control Unit Control signals derived from instruction R-type rs rt rd shamt funct 3:26 25:2 2:6 5: :6 5: Load/ Store Branch 35 or 43 rs rt address 3:26 25:2 2:6 5: 4 rs rt address 3:26 25:2 2:6 5: opcode always read read, except for load write for R-type and load sign-extend and add (23) ALU used for v Load/Store: Function = add v Branch: Function = subtract v ALU Control (4.4,.2) R-type: Function depends on funct field ALU control Function AN OR add subtract set-on-less-than NOR (24) 2

ALU Control Assume 2-bit ALUOp derived from opcode v Combinational logic derives ALU control opcode ALUOp Operation funct ALU function ALU control lw load word XXXXXX add sw store word XXXXXX add beq branch equal XXXXXX subtract R-type add add subtract subtract AN AN OR OR set-on-less-than set-on-less-than How do we turn this description into gates? (25) ALU Controller lw/sw beq arith ALUOp Funct field ALU ALUOp ALUOp F5 F4 F3 F2 F F Control X X X X X X X X X X X X X X X X X X X X X X X X X X X X Generated from ecoding inst[3:26] inst[5:] add sub add sub and or slt ALUOp ALU control 3 ALU control Zero ALU ALU result funct = inst[5:] (26) 3

ALU Control Simple combinational logic (truth tables) ALUOp ALUOp ALUOp ALU control block F3 Operation2 F (5 ) F2 F F Operation Operation Operation (27) atapath With Control Use rt not rd Instruction Regst ALUSrc Memto- Reg Reg Mem Mem Branch ALUOp ALUp R-format lw sw X X beq X X (28) 4

Commodity Processors ARM 7 Single Cycle atapath (29) Control Unit Signals Memto- Reg Mem Mem Inputs Op5 Op4 Op3 Op2 Op Op Inst[3:26] Instruction Regst ALUSrc Reg Branch ALUOp ALUp R-format lw sw X X beq X X Adding a new instruction? R-format Iw sw beq Outputs Regst ALUSrc MemtoReg Reg Mem Mem To harness the path Branch ALUOp ALUOpO (3) 5

Controller Implementation LIBRARY IEEE; USE IEEE.ST_LOGIC_64.ALL; USE IEEE.ST_LOGIC_ARITH.ALL; USE IEEE.ST_LOGIC_SIGNE.ALL; ENTITY control IS PORT( SIGNAL Opcode : IN ST_LOGIC_VECTOR( 5 OWNTO ); SIGNAL Regst : OUT ST_LOGIC; SIGNAL ALUSrc : OUT ST_LOGIC; SIGNAL MemtoReg : OUT ST_LOGIC; SIGNAL Reg : OUT ST_LOGIC; SIGNAL Mem : OUT ST_LOGIC; SIGNAL Mem : OUT ST_LOGIC; SIGNAL Branch : OUT ST_LOGIC; SIGNAL ALUop : OUT ST_LOGIC_VECTOR( OWNTO ); SIGNAL clock, reset : IN ST_LOGIC ); EN control; (3) Controller Implementation (cont.) ARCHITECTURE behavior OF control IS SIGNAL R_format, Lw, Sw, Beq : ST_LOGIC; BEGIN -- Code to generate control signals using opcode bits R_format <= '' WHEN Opcode = "" ELSE ''; Lw <= '' WHEN Opcode = "" ELSE ''; Sw <= '' WHEN Opcode = "" ELSE ''; Beq <= '' WHEN Opcode = "" ELSE ''; Regst <= R_format; Implementation ALUSrc <= Lw OR Sw; of each table MemtoReg <= Lw; column Reg <= R_format OR Lw; Mem <= Lw; Mem <= Sw; Branch <= Beq; ALUOp( ) <= R_format; Memto- Reg Mem Mem ALUOp( ) <= Beq; Instruction Regst ALUSrc Reg Branch ALUOp ALUp EN behavior; R-format lw sw X X beq X X (32) 6

R-Type Instruction (33) Load Instruction (34) 7

Branch-on-Equal Instruction (35) Implementing Jumps Jump 2 address 3:26 25: Jump uses word address Update PC with concatenation of v Top 4 bits of old PC v 26-bit jump address v Need an extra control signal decoded from opcode (36) 8

atapath With Jumps Added clk (37) Energy Behavior combinational activity storage read/write access (38) 9

Recall Hierarchy of Energy Models C Q latch C Q latch _ C Q Q _ Q ALU Aggregate energy expenditure into higher level modules a b c x y Aggregate energy expenditure into gate level estimates Vin Vdd PMOS Vout NMOS Switch level activity (dynamic) and leakage (static) energy costs Ground (39) A Simple Architecture Energy Model To a first order, we can use the per-access energy of each major component v Obtain this for a technology generation Use this per-access energy to compute the energy of each instruction Note: v This is a high level approximation. The actual physics is more complicated. v However, this useful for several purposes What components do each instruction exercise? (4) 2

Example: Updating the PC Branch Add PC 4 address Instruction [3 ] Instruction memory Instruction [25 2] Instruction [2 6] Instruction [5 ] Regst Instruction [5 ] register Reg register register 2 2 Registers 6 Sign 32 extend Shift left 2 ALUSrc ALU control Add result ALU Zero ALU ALU result Mem Address What is the energy cost of this operation? ata memory Mem MemtoReg Instruction [5 ] ALUOp (4) Example: Register Instructions Branch Add PC 4 address Instruction [3 ] Instruction memory Instruction [25 2] Instruction [2 6] Instruction [5 ] Regst Instruction [5 ] register Reg register register 2 2 Registers 6 Sign 32 extend Shift left 2 ALUSrc ALU control Add result ALU Zero ALU ALU result Mem Address What is the energy cost of this operation? ata memory Mem MemtoReg Instruction [5 ] ALUOp (42) 2

Example: I-type Instructions Branch Add PC 4 address Instruction [3 ] Instruction memory Instruction [25 2] Instruction [2 6] Instruction [5 ] Regst Instruction [5 ] register Reg register register 2 2 Registers 6 Sign 32 extend Shift left 2 ALUSrc ALU control Add result ALU Zero ALU ALU result Mem Address What is the energy cost of this operation? ata memory Mem MemtoReg Instruction [5 ] ALUOp (43) Example: I-Type for Branches Add PC 4 address Instruction [3 ] Instruction memory Instruction [25 2] Instruction [2 6] Instruction [5 ] Regst Instruction [5 ] register Reg register register 2 2 Registers 6 Sign 32 extend Shift left 2 ALUSrc ALU control Add result ALU Branch Zero ALU ALU result Mem Address What is the energy cost of this operation? ata memory Mem MemtoReg Instruction [5 ] ALUOp (44) 22

Converting Energy to Power For this path, except for memory, all components are active every cycle, and dissipating energy on every cycle v Later we will see how paths can be made more energy efficient Computing power v Compute the total energy consumed over all cycles (instructions) v ivide energy by time to get power in watts Example: (45) Example: A Simple Energy Model We can use a simple model of per-access energy for the architecture components Common Components Access Energy ( -2 joules) Inst. ecode LogicSwitching 6.78 Inst. Registers 2.74 4.38 FP. Registers.26.98 Other Buffers 9.74.8 ALU + Result Bus (interconnect) Logic Switching 23.2 FPU + Result Bus (interconnect) Logic Switching 24.2 @6nm Each unit can be accessed multiple times depending on instruction type An Intel/AM x86 instruction consume 6pJ ~ 4nJ dynamic energy. (46) 23

ITRS Roadmap for Logic evices From: ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems, P. Kogge, et.al, 28 (47) All of the logic is combinational We wait for everything to settle down, and the right thing to be done v v Our Simple Control Structure ALU might not produce right answer right away we use write signals along with clock to determine when to write Cycle time determined by length of the longest path State element Combinational logic State element 2 Clock cycle We are ignoring some details like setup and hold times (48) 24

Performance Issues Longest delay determines clock period v Critical path: load instruction v Instruction memory register file ALU memory register file Not feasible to vary period for different instructions Violates design principle v Making the common case fast We will improve performance by pipelining (49) Summary Single cycle path v All instructions execute in one clock cycle v Not all instructions take the same amount of time v v Software sees a simple interface Can memory operations really take one cycle? Improve performance via pipelining, multicycle operation, parallelism or customization We will address these next (5) 25

Study Guide Given an instruction, be able to specify the values of all control signals required to execute that instruction Add new instructions: modify the path and control to affect its execution v E.g., jal, jr, shift, etc. v Modify the VHL controller Given delays of various components, determine the cycle time of the path istinguish between those parts of the path that are unique to each instruction and those components that are shared across all instructions (5) Study Guide (cont.) Given a set of control signal values determine what operation the path performs Given the per access energies of each component: v Compute the energy required of any instruction v Given a program and clock rate compute the power dissipation of the path (52) 26

Glossary Asynchronous Clock Controller Critical path Flip Flop ITRS Roadmap Per-access energy Program counter Register Synchronous (53) 27