Learning Outcomes. Spiral 3-3. Sorting: Software Implementation REVIEW

Similar documents
Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

Processor (I) - datapath & control. Hwansoo Han

Chapter 4. The Processor

The overall datapath for RT, lw,sw beq instrucution

ECE232: Hardware Organization and Design

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

CS61C : Machine Structures

Chapter 4. The Processor

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization

Chapter 4. The Processor. Computer Architecture and IC Design Lab

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Introduction. Chapter 4. Instruction Execution. CPU Overview. University of the District of Columbia 30 September, Chapter 4 The Processor 1

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Introduction. ENG3380 Computer Organization and Architecture MIPS: Data Path Design Part 3. Topics. References. School of Engineering 1

Chapter 4. The Processor Designing the datapath

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Systems Architecture

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

ECE260: Fundamentals of Computer Engineering

are Softw Instruction Set Architecture Microarchitecture are rdw

Review: Abstract Implementation View

CENG 3420 Lecture 06: Datapath

Introduction. Datapath Basics

Single Cycle Datapath

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

Chapter 4 The Processor 1. Chapter 4A. The Processor

Inf2C - Computer Systems Lecture Processor Design Single Cycle

The Processor: Datapath & Control

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Microprogrammed Control Approach

Single Cycle Datapath

Review. N-bit adder-subtractor done using N 1- bit adders with XOR gates on input. Lecture #19 Designing a Single-Cycle CPU

Chapter 4. The Processor

CC 311- Computer Architecture. The Processor - Control

Chapter 4. The Processor

LECTURE 5. Single-Cycle Datapath and Control

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

CPE 335 Computer Organization. Basic MIPS Architecture Part I

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

The Big Picture: Where are We Now? EEM 486: Computer Architecture. Lecture 3. Designing a Single Cycle Datapath

Design of the MIPS Processor

CS 61C: Great Ideas in Computer Architecture Datapath. Instructors: John Wawrzynek & Vladimir Stojanovic

RISC Processor Design

CS61C : Machine Structures

Ch 5: Designing a Single Cycle Datapath

Fundamentals of Computer Systems

Laboratory 5 Processor Datapath

EE 457 Unit 6a. Basic Pipelining Techniques

CPE 335. Basic MIPS Architecture Part II

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Systems Architecture I

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

TDT4255 Computer Design. Lecture 4. Magnus Jahre. TDT4255 Computer Design

CS3350B Computer Architecture Quiz 3 March 15, 2018

The MIPS Processor Datapath

COMPUTER ORGANIZATION AND DESIGN

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

RISC Design: Multi-Cycle Implementation

Computer Hardware Engineering

Outline. EEL-4713 Computer Architecture Designing a Single Cycle Datapath

EE 209 in Context. Spiral 0 Review of EE 109L. Digital System Spectrum. Where Does Digital Design Fit In

ﻪﺘﻓﺮﺸﻴﭘ ﺮﺗﻮﻴﭙﻣﺎﻛ يرﺎﻤﻌﻣ MIPS يرﺎﻤﻌﻣ data path and ontrol control

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

CS Computer Architecture Spring Week 10: Chapter

CPU Organization (Design)

ENE 334 Microprocessors

Processor. Han Wang CS3410, Spring 2012 Computer Science Cornell University. See P&H Chapter , 4.1 4

CSE140: Components and Design Techniques for Digital Systems

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Chapter 5: The Processor: Datapath and Control

Data paths for MIPS instructions

361 datapath.1. Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

MIPS ISA. 1. Data and Address Size 8-, 16-, 32-, 64-bit 2. Which instructions does the processor support

COSC3330 Computer Architecture Lecture 7. Datapath and Performance

Implementing the Control. Simple Questions

Topic #6. Processor Design

The Processor: Datapath & Control

EE 457 Midterm Summer 14 Redekopp Name: Closed Book / 105 minutes No CALCULATORS Score: / 100

EE 457 Midterm Summer 14 Redekopp Name: Closed Book / 105 minutes No CALCULATORS Score: / 100

UC Berkeley CS61C : Machine Structures

Lecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón

Lecture 10: Simple Data Path

ECE369. Chapter 5 ECE369

Lecture 7 Pipelining. Peng Liu.

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

CS222: Processor Design

COSC 6385 Computer Architecture - Pipelining

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

Computer Architecture. Lecture 6.1: Fundamentals of

Mark Redekopp, All rights reserved. EE 357 Unit 11 MIPS ISA

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

Register Transfer Level (RTL) Design

COMP303 Computer Architecture Lecture 9. Single Cycle Control

ENGN1640: Design of Computing Systems Topic 04: Single-Cycle Processor Design

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Transcription:

3-3. Learning Outcomes 3-3. Spiral 3-3 Single Cycle CPU I understand how the single-cycle CPU datapath supports each type of instruction I understand why each mux is needed to select appropriate inputs to the datapath components I know how to design the control signals as a function of the type of instruction 3-3.3 Sorting: Software Implementation 3-3. Hardware vs. Software REVIEW To perform the algorithm in software means the processor fetches instructions, executes them, which causes the processor to then read and write the data in memory into it's sorted positions Sorting 6 element on a.8 GHz Xeon processor 6 microseconds Can we do better w/ more HW? Processor D C Memory 78 fffff 6 3 Custom (Sort) HW

Sorting: Hardware Implementation 3-3. Sorting: Final Comparison 3-3.6...from memory ( per clock) Sorting 6 element on a.8 GHz Xeon processor [SW only] 6 microseconds Sorting 6 numbers in [old] custom HW period = 3 ns => 6 microseconds total 3 ns is due to the 8 number HW sorter Merging (Select-Val) stages are < ns Can we improve? X X X X3 X X X6 X7 3 ns HW Sorting Network Y Y Y Y3 Y Y Y6 Y7...... FIFO/Queue a/b 8 FIFO/Queue a/b 8 ns ns ns SelectVal FIFO/Queue a/b 6 FIFO/Queue a/b 6 What did we do to reduce period in this design? SelectVal FIFO/Queue a/b FIFO/Queue a/b SelectVal 6...to memory Sorting 6 element on a.8 GHz Xeon processor [SW only] 6 microseconds total time Sorting 6 numbers in [old] custom HW period = 3 ns => 6 microseconds total = ~.x speedup Sorting 6 numbers in [old] pipelined HW period = ns => microseconds total = ~8x speedup Processor Processor is freed to do other work D C fffff Custom (Sort) HW Memory 78 6 3 3-3.7 CPU Organization Scope 3-3.8 uilding hardware to execute software GENERL PURPOSE HRDWRE We will build a CPU to implement our subset of the MIPS IS Memory Reference Instructions: Load Word (LW) Store Word (SW) rithmetic and Logic Instructions: DD, SU, ND, OR, SLT ranch and Instructions: ranch if equal (EQ) unconditional (J) These basic instructions exercise a majority of the necessary datapath and control logic for a more complete implementation 8

3-3.9 3-3. Single-Cycle CPU path Fetch [3:6] [:] [:] [:6] [:] Reg. # Mem & Mem Op[:] Src Reg data 6 INST[:] Op[:] ranch Src control Src Mem Mem 9 ddress in is used to fetch instruction while it is also incremented by to point to the next instruction Remember, the doesn t update until the end of the clock cycle / beginning of next cycle Mux provides a path for branch target addresses Fetch x8 xc x8 branch target xa8 DD $6,$9,$ opcode rs rt rd shamt func Decode Opcode and func. field are decoded to produce other control signals Execution of an instruction (DD $3,$,$) requires reading register values and writing the result to a third REG is an enable signal indicating the write data should be written to the specified register Instruction Word DD $3,$,$ opcode rs rt shamt rd func Reg. # Logic data REG als Value of $ Value of $ Result from add 3-3. is the collection of GPR s. Our register file has 3 (ability to concurrently read or write a register). To see why we need 3, consider an DD $3,$,$. We need to read two operands (i.e. $ $) and for the result ($3) registers each storing -bits registers => Muxes to choose desired value register => Decoder and registers w/ enable data Reg -to- decoder converts -bit write reg. # to -of- output signals to enable that register to capture the write data on the next edge. If Regis the decoder is disabled making all outputs go to and thus no register updates. [:] EN 3 D EN D EN D EN $ $ $3 Reg # Reg # 3 3 data 3-3. Each Mux chooses which register value to output based on the -bit reg. # provided by the instruction

3-3.3 3-3. path for instruction takes inputs from register file and performs the add, sub, and, or, slt, operations Result is written back to dest. register Memory ccess path Operands are read from register file while offset is sign extended calculates Memory access is performed If LW, LW $,xfff8($) SW $3,xa($) word DD $3,$,$ 3 Reg. # data $ value $ value op 3 Reg. # data $ value xffff fff8 DD 3 Reg. # data $ value xa $3 value 3-3. 3-3.6 ranch path EQ requires for comparison (examine zero output) extension unit for branch offset dder to add and offset Need a separate adder since is used to perform comparison Fetch path Question Can the adder used to increment the be an and be used/shared for instructions like DD/SU/etc. In a single-cycle CPU, word EQ $,$,offset word offset Reg. # data (incremented ) Shift byte offset op $ value $ value dder extended word offset ranch Target ddress to ZERO Next = Current / ddress S / I-MEM Instruction Word 6

3-3.7 3-3.8 Fetch path Question RegFile Question Do we need the enable signal on the register for our single-cycle CPU? In the single-cycle CPU, Why do we need the write enable signal, REG? Next = Current / ddress S / I-MEM Instruction Word 7 Instruction Word ex. instruc. opcode rs rt shamt rd func Reg. # Logic data REG als Value of $ Value of $ Result from add 8 3-3.9 3-3. RegFile Question Extension Unit Can write to registers be level sensitive or does it have to be edge-sensitive? Instruction Word ex. instruc. opcode rs rt shamt rd func Logic Reg. # data REG als Value of $ Value of $ Result from add 9 In a LW or SW instructions with their base register offset format, the instruction only contains the offset as a 6-bit value Example: LW $,-8($) Machine Code: x8cfff8-8 = xfff8 The 6-bit offset must be extended to -bits before being added to base register LW $,xfff8($) opcode rs rt offset offset = xfff8 6 xfffffff8

3-3. 3-3. Extension Questions ranch path Question What logic is inside a sign-extension unit? How do we sign extend a number? Do you need a shift register? Is it okay to start adding branch offset even before determining whether the branch is taken or not? b b b b 3 b b b b b 3 b 6-bit offset -bit sign-extended output word EQ $,$,offset word offset Reg. # data (incremented ) Shift $ value $ value dder op extended word offset ranch Target ddress to ZERO (To control logic) 3-3.3 3-3. Combining paths Src Mux Now we will take the datapaths for each instruction type and try to combine them into one nywhere we have multiple options for a certain input we can use a mux to select the appropriate value for the given instruction Select bits must be generated to control the mux Mux controlling second input to instruction provides Register data to the nd input of LW/SW uses nd input of as an offset to form effective address 3 Reg. # Instruction data $ value $ value op Reg. # data $ value xffff fff8 DD Mem. Instruction data 3 Src

3-3. 3-3.6 Mux Src Mux Mux controlling writeback value to register file instructions use the result of the LW uses the read data from data memory Next instruction can either be at the next sequential address () or the branch target address (offset) Reg. # data 6 Reg. # data 6 ranch Target ddress Src 6 3-3.7 3-3.8 Mux Single-Cycle CPU path Different destination register ID fields for and LW instructions R-Type () rs rt rd shamt func 3-6 - -6 - -6 - Destination Register Number I-Type (LW) 3 or 3 rs rt address offset rs rt rd 3-6 - -6 - Reg. # data 6 7 [:] [:] [:6] [:] Reg Reg. # data 6 INST[:] Op[:] Src ranch control Src Mem 8 Mem

3-3.9 3-3.3 Single-Cycle CPU path Implementation [3:6] [:] [:] [:6] [:] Reg. # Mem & Mem Op[:] Src Reg data 6 INST[:] Op[:] ranch Src control Src Mem Mem 9 [:] [:] [3:6] 6 8 [:] [:6] [:] Reg. # Mem & Mem Op[:] Src Reg data 6 INST[:] Op[:] ddress = {New[3:8], INST[:],} ranch Src Next ddress control ranch ddress Src Mem Mem 3 3-3.3 3-3. Unit Design for Single-Cycle CPU Unit Unit: Maps instruction to control signals Traditional Unit FSM: Produces control signals asserted at different times Design NSL, SM, OFL Single-Cycle Unit Every cycle we perform the same steps: Fetch, Decode, Execute als are not necessarily time based but instruction based => only combinational logic Inputs (Instruction/Opcode) NSL SM Traditional Unit Outputs OFL # of FF s in tightly-encoded state assignment: -8 states:, 9-6 states: Inputs (Instruction/Opcode) NSL SM State Outputs OFL Most control signals are a function of the opcode (i.e. LW/SW, R-Type, ranch, ) is a function of opcode ND function bits. OpCode ([3:6]) Func. ([:]) OpCode ([3:6]) Unit Unit Func. ([:]) ranch Mem Mem Src Reg [:] ranch Mem Mem Src Reg Op[:] to Single-Cycle Unit Only state 3=> FF s

3-3.33 3-3.3 Truth Table needs to know what instruction type it is: R-Type (op. depends on func. code) LW/SW (op. = DD) EQ (op. = SU) Let main control unit produce Op[:] to indicate, then use function bits if necessary to tell the what to do OpCode ([3:6]) Unit Op[:] Func. ([:]) to [:] is a function of: Op[:] and Func.[:] Op[:] Instruction Operation Func.[:] Desired ction LW Load word X dd SW Store word X dd ranch EQ X Subtract R-Type ND nd R-Type OR Or Instruction Op[:] R-Type dd dd LW/SW R-Type Sub Subtract ranch R-Type unit maps instruction opcode to Op[:] encoding R-Type SLT Set on less than Produce each [:] bit from the Op and Func. inputs 3 al Generation Other control signals are a function of the opcode We could write a full truth table or (because we are only implementing a small subset of instructions) simply decode the opcodes of the specific instructions we are implementing and use those intermediate signals to generate the actual control signals OpCode ([3:6]) Unit ranch Mem Mem Src Reg Op[:] Could generate each control signal by writing a full truth table of the 6-bit opcode OpCode ([3:6]) Decoder R-Type LW SW EQ Unit ranch Mem Mem Src Reg Op[:] Simpler for human to design if we decode the opcode and then use individual instruction signals to generate desired 3 control signals 3-3.3 R- Type al Truth Table LW SW EQ J ranch Reg Dst Src Memto- Reg Reg Mem Mem Op[] X X X [:] [:] [3:6] [:] [:6] [:] 6 8 Reg. # Reg data Mem & Mem Op[:] Src 6 INST[:] Op[:] ddress ranch Src Next ddress control ranch ddress Src Mem Mem 36 3-3.36 Op[]

al Logic 3-3.37 Credits 3-3.38 Op[] Op[] Op[3] Op[] Op[] Op[] Decoder These slides were derived from Gandhi Puvvada s EE 7 Class Notes R-Type LW SW EQ J ranch Src Reg Mem Mem Op Op 37