Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed

Similar documents
Lecture 5: The Processor

Lecture 6: Pipelining

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

CPE 335 Computer Organization. Basic MIPS Architecture Part I

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

CENG 3420 Lecture 06: Datapath

Review: Abstract Implementation View

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

EEC 483 Computer Organization

comp 180 Lecture 25 Outline of Lecture The ALU Control Operation & Design The Datapath Control Operation & Design HKUST 1 Computer Science

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

Single-Cycle Examples, Multi-Cycle Introduction

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

Computer and Information Sciences College / Computer Science Department The Processor: Datapath and Control

Designing a Pipelined CPU

Chapter 4 (Part II) Sequential Laundry

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

What do we have so far? Multi-Cycle Datapath (Textbook Version)

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 3 & Appendix C Pipelining Part A: Basic and Intermediate Concepts

Ch 5: Designing a Single Cycle Datapath

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

Pipelining: Basic Concepts

ﻪﺘﻓﺮﺸﻴﭘ ﺮﺗﻮﻴﭙﻣﺎﻛ يرﺎﻤﻌﻣ MIPS يرﺎﻤﻌﻣ data path and ontrol control

CSE 2021 Computer Organization. Hugh Chesser, CSEB 1012U W9-W

T = I x CPI x C. Both effective CPI and clock cycle C are heavily influenced by CPU design. CPI increased (3-5) bad Shorter cycle good

LECTURE 5. Single-Cycle Datapath and Control

ECE473 Computer Architecture and Organization. Processor: Combined Datapath

Pipelining. Chapter 4

The overall datapath for RT, lw,sw beq instrucution

CPE 335. Basic MIPS Architecture Part II

Processor (I) - datapath & control. Hwansoo Han

The Processor: Datapath & Control

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Chapter 4. The Processor

Chapter 4. The Processor

Computer Architecture. Lecture 6.1: Fundamentals of

Chapter 5: The Processor: Datapath and Control

Systems Architecture

Improve performance by increasing instruction throughput

Chapter Six. Dataı access. Reg. Instructionı. fetch. Dataı. Reg. access. Dataı. Reg. access. Dataı. Instructionı fetch. 2 ns 2 ns 2 ns 2 ns 2 ns

PS Midterm 2. Pipelining

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

What do we have so far? Multi-Cycle Datapath

Chapter 4 The Processor 1. Chapter 4A. The Processor

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12

Chapter 4. The Processor

Processor Architecture

ECE260: Fundamentals of Computer Engineering

COMPUTER ORGANIZATION AND DESIGN

CSE 2021 Computer Organization. Hugh Chesser, CSEB 1012U W12-M

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

CPU Organization (Design)

Advanced Computer Architecture Pipelining

ECE232: Hardware Organization and Design

CSE140: Components and Design Techniques for Digital Systems

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

Outline Marquette University

Lecture 7 Pipelining. Peng Liu.

Topic #6. Processor Design

COMPUTER ORGANIZATION AND DESIGN

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

LECTURE 3: THE PROCESSOR

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.

Single Cycle Datapath

CPS104 Computer Organization and Programming Lecture 19: Pipelining. Robert Wagner

ECE331: Hardware Organization and Design

COMP2611: Computer Organization. The Pipelined Processor

Chapter 4. The Processor

COSC 6385 Computer Architecture - Pipelining

CS 152 Computer Architecture and Engineering Lecture 4 Pipelining

SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Chapter 6 ADMIN. Reading for Chapter 6: 6.1,

EE 457 Unit 6a. Basic Pipelining Techniques

Outline. Combinational Element. State (Sequential) Element. Clocking Methodology. Input/Output of Elements

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

ECE369. Chapter 5 ECE369

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

EC 413 Computer Organization - Fall 2017 Problem Set 3 Problem Set 3 Solution

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Pipelined Datapath. One register file is enough

The Pipelined MIPS Processor

COMPUTER ORGANIZATION AND DESIGN

Unpipelined Machine. Pipelining the Idea. Pipelining Overview. Pipelined Machine. MIPS Unpipelined. Similar to assembly line in a factory

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

CS359: Computer Architecture. The Processor (A) Yanyan Shen Department of Computer Science and Engineering

Single Cycle Datapath

ECE170 Computer Architecture. Single Cycle Control. Review: 3b: Add & Subtract. Review: 3e: Store Operations. Review: 3d: Load Operations

CC 311- Computer Architecture. The Processor - Control

1048: Computer Organization

CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath

Major CPU Design Steps

EECS 322 Computer Architecture Improving Memory Access: the Cache

Lecture 8: Control COS / ELE 375. Computer Architecture and Organization. Princeton University Fall Prof. David August

Transcription:

Lecture 3: General Purpose Processor Design CSCE 665 Advanced VLSI Systems Instructor: Saraju P. ohanty, Ph. D. NOTE: The figures, tet etc included in slides are borrowed from various books, websites, authors pages, and other sources for academic purpose only. The instructor does not claim any originality.

Lecture Outline General Purpose Processor Program Eecution Construction of a Simple IPS Processor Single Cycle Processor ulticycle Processor Pipelined Processor 2

High Level Language Program Levels of Representation Compiler Assembly Language Program temp = v[k]; v[k] = v[k+]; v[k+] = temp; lw $5, ($2) lw $6, 4($2) sw $6, ($2) Assembler sw $5, 4($2) achine Language Program achine Interpretation Control Signal Specification OP[:3] <= InstReg[9:] & ASK 3

Eecution Cycle Fetch Decode Operand Fetch Eecute Result Store Net Obtain instruction from program storage Determine required actions and instruction size Locate and obtain operand Compute value or status Deposit s in storage for later use Determine successor instruction 4

The Processor: path & Control Let us look at an implementation of the IPS Simplified to contain only: -reference instructions: lw, sw arithmetic-logical instructions: add, sub, and, or, slt control flow instructions: beq, j Generic Implementation: use the program counter (PC) to supply instruction address get the instruction from read registers use the instruction to decide eactly what to do 5

ore Implementation Details Abstract / Simplified View: Register # PC ress Registers Register # Register # ress Two types of functional units: elements that operate on values (combinational) elements that contain state (sequential) 6

Register File: Operation Built using D flip-flops (Combinational in nature) register number register number 2 Register Register Register n Register n u u R ead 2 7

Register File: Operation We still use the real clock to determine when to write C D Register Register number n-to- decoder n C D Register n Register C Register n D C D Register n 8

Register File: Block Diagram register number register number 2 register Register file 2 Three ress ports One Input Port Two Output t Ports One Control Signal 9

a: emory Functional Units - I After an instruction ti address is put, the instruction ti residing at the address appears at the output port b: Program Counter -- A simple upcounter c: er -- A 2 s complement adder address PC Sum a. b. Program counter c. er

a: Register File Functional Units - II It s construction, read, and write operations as discussed previously b: (Arithmetic ti & Logic Unit) Recall the design we have discussed in last two classes Note the Zero output control 5 3 register Register 5 numbers register 2 Zero Registers 5 register 2 Reg a. Register File b.

a: emory Unit Functional Units -III Similar to emory Unit, only that it can written into as well Two input ports for address and, one output port (for read out) Two control signals: for and operations b: Sign Etension Unit Etends 6-bit input operand to 32 bits em ress 6 Sig n 32 etend em a. m emory unit b. Sign-etension unit 2

path for Fetch (Piece I) Fetching s and Incrementing the program counter by 4 4 PC ad dre ss 3

path for R-type s (Piece II) path for R-type s register register 2 Registers register 2 3 operation Zero Reg 4

path for Load/Store (Piece III) path for load or store () Register Access; (2) emory ress calculation; (3) / (4) into Register file (if the instruction is a load) 3 operation register register 2 Zero Registers register 2 Reg ress em 6 Sign 32 etend em 5

path for Branch (Piece IV) Unit Shift left 2 adds at the low-order end of the sign-etended offset Control logic is used to decide whether the incremented PC or branch target should replace the PC based on the Zero output of the PC + 4 from instruction path Sum Branch target Shift left 2 register register 2 Registers register Reg 2 6 Sign 32 etend operation 3 To branch Zero control logic 6

path Construction Strategy Now, we have pieces of path that are capable of performing distinct functions We want to stitch them together to yield a final path that can eecute all the instructions (lw, sw, add, sub, and, or, slt, beq, j) We will use multipleors (or mues for short) for stitching the path 7

i register l path Construction (erge Pieces II & III) Piece II + Piece III ri r register register 2 Registers register 2 register register 2 Registers register Reg Reg 2 6 Sign 32 etend 3 operation Zero 3 operation Zero ress em em 8

And you will get.. Rule: Whenever we have more than one input feeding a functional unit, introduce a multipleor (this gives rise to a control signal, more later..) Registers register register 2 register 2 Reg Src u operation 3 em Zero ress emtoreg u 6 32 Sign etend em 9

path Construction contd (merge Piece I) Just tack the Fetch and PC increment logic at the front! 4 PC address Registers register register 2 register Reg 2 Src u operation 3 em Zero ress emtoreg u 6 Sign 32 etend em 2

atapath Construction (Finally, merge Piece IV) PCSrc u 4 Shift left 2 PC address Registers register register 2 register Reg 2 6 Sign 32 etend Src u 3 operation em Zero ress em emtoreg u 2

Final path u 4 Reg Shift left 2 PCSrc PC address [3 ] [25 2] [2 6] [5 ] [5 ] register Src register 2 Zero 2 u u RegDst flows through various paths under the influence of control signals There are seven control signals (of type,, or u Select) register Registers 6 Sign 32 etend ti [5 ] control Op em ress em emtoreg u 22

Defining the Control.. Selecting the operations to perform (, read/write, etc.) Controlling the flow of (multipleor inputs) Information comes from the 32 bits of the instruction Eample: add $8, $7, $8 Format: op rs rt rd shamt funct 's operation based on instruction type and function code We will design two control units: () Control to generate appropriate function select signals for the (2) ain Control to generate signals for functional units other than the 23

Control - Truth Table & Implementation Describe it using a truth table (can turn into gates): Op Funct field Operation Op Op F5 F4 F3 F2 F F X X X X X X X X X X X X X X X X X X X X X X X X X X X X Op Op Op control block F3 Operation2 F (5 ) F2 F F Operation Operation Operation 24

Designing the ain Control 4 [3 26] Control RegDst Branch em emtoreg Op em Src Reg Shift left 2 u PC address [3 ] [25 2] [2 6] [5 ] register register 2 Registers 2 register u u Zero ress u [5 ] 6 Sign 32 etend control [5 ] 25

Control Signals and their Effects Signal Name Effect When deasserted Effect when asserted RegDst The register destination number for the The register destination number for the register comes from the rt field (bits 2-6) register comes from the rd field (bits 5-) Reg NONE The register on the register input is written with the value on the input Src The second operand comes from the The second operand is the sign-etended second register file output ( 2) lower 6 bits of the instruction PCSrc The PC is replaced by the output of the adder that The PC is replaced by the output of the adder computes the value of PC + 4. that computes the branch target em None emory contents t designated d by the address input are put on the output em None contents designated by the address input are replaced by the value on the input emtoreg The value fed to the register input The value fed to the register input comes comes from the from the. 26

ain Control: Truth Table & Implementation emto- Reg em em RegDst Src Reg Branch Op p R-format lw sw X X beq X X Inputs Op5 O p 4 Op3 Op2 Op O p R -fo r m a t Iw s w b e q Outputs RegDst Src emtoreg R e g W rite em em Branch Op OpO 27

Our Simple Control Structure All of the logic is combinational We wait for everything to settle down, and the right thing to be done might not produce right answer right away we use write signals along with clock to determine when to write Cycle time determined by length of the longest path State element Combinational logic State element 2 Clock cycle We are ignoring some details like setup and hold times 28

We designed a Single Cycle Implementation Calculate cycle time assuming negligible delays ecept: (2ns), and adders (2ns), register file access (ns) PCSrc u 4 Reg Shift left 2 PC address [3 ] [25 2] [2 6] u [5 ] RegDst [5 ] register register 2 2 register Registers 6 Sign 32 etend Src u control Zero em ress em emtoreg u [5 ] Op 29

How does the single cycle path works? Let us understand this by highlighting the portions of the path when an R-type instruction is eecuted For an R-type instruction we go through the following phases: Phase : Fetch Phase 2: Register File Phase 3: eecution Phase 4: the Result into the Register File NOTE: All the four phases are completed in only ONE clock cycle and hence it is a single cycle implementation 3

R-type Phase ( Fetch) u 4 [3 26] Control RegDst Branch em emtoreg Op em Src Reg Shift left 2 PC address [3 ] ti [25 2] 2] [2 6] [5 ] register register 2 Registers 2 register u u Zero ress u [5 ] 6 Sign 32 etend control [5 ] 3

R-type Phase 2 (Register ) u 4 [3 26] Control RegDst Branch em emtoreg Op em Src Reg Shift left 2 PC address [3 ] ti [25 2] 2] [2 6] [5 ] register register 2 Registers 2 register u u Zero ress u [5 ] 6 Sign 32 etend control [5 ] 32

R-type Phase 3 ( eecution) u 4 [3 26] Control RegDst Branch em emtoreg Op em Src Reg Shift left 2 PC address [3 ] [25 2] [2 6] [5 ] register register 2 Registers 2 u register u Zero ress u [5 ] 6 Sign 32 etend control [5 ] 33

R-type Phase 4 ( the Result) u 4 [3 26] Control RegDst Branch em emtoreg Op em Src Reg Shift left 2 PC address [3 ] ti [25 2] 2] [2 6] [5 ] register register 2 Registers 2 register u u Zero ress u [5 ] 6 Sign 32 etend control [5 ] 34

How do we handle jump? [25 ] Shift Jump address [3 ] left 2 26 28 4 PC+4 [3 28] [3 26] Control RegDst Jump Branch em emtoreg Op em Src Reg Shift left 2 u u PC address [3 ] [25 2] [2 6] [5 ] register register 2 Zero Registers 2 u register u ress u [5 ] 6 Sign 32 etend control [5 ] 35

Where we are headed Single Cycle Problems: what if we had a more complicated instruction like floating point? wasteful of area One Solution: use a smaller ccletime cycle have different instructions take different numbers of cycles a multicycle path: 36

Single Cycle Implementation: Summary All instructions are eecuted in only clock cycle We built a single cycle path from scratch We designed appropriate controller to generate correct correct signals All instructions are not born equal; that some require more work, some less => disadvantage of single cycle implementation is that the slowest instruction determines the clock cycle width In reality, no body implements single cycle approach Suggestion: STARE, STARE, STARE at the single cycle path to familiarize yourself Given the single cycle path, you should be able to highlight active portions of the path for any given instruction ( just as we did for an R-type instruction in the class) 37

Single Cycle Implementation - Recap Single Cycle Problems: what if we had a more complicated instruction like floating point? wasteful of area Cycle width determined by the slowest instruction One Solution: use a smaller cycle time have different instructions ti take different numbers of cycles a multicycle path: 38

ulticycle Approach High Level View PC ress emory or register emory register Register # Registers Register # Register # A B Out 39

Single Cycle, ultiple Cycle, vs. Pipeline Clk Cycle Single Cycle Implementation: Cycle 2 Load Store Waste Clk Cycle Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle ultiple Cycle Implementation: Load Store Ifetch Reg Eec em Wr Ifetch Reg Eec em R-type Ifetch Pipeline Implementation: Load Ifetch Reg Eec em Wr Store Ifetch Reg Eec em Wr R-type Ifetch Reg Eec em Wr 4

Basic Idea IF: fetch u ID: decode/ register file read EX: Eecute/ address calculation E: emory access WB: back 4 Shift left 2 PC ress register register 2 Registers 2 register u Zero ress u 6 Sign etend 32 4

u Corrected path IF/ID ID/EX EX/E E/WB 4 Shift left 2 PC ress Instructio on register register 2 Registers 2 register u Zero ress u 6 Sign etend 32 42

PCSrc path with Control u Control ID/EX WB EX/E WB E/WB IF/ID EX WB PC 4 ress ruction Instr Reg register register 2 Registers 2 register Shift left 2 u Src Zero Branch em ress em mtoreg u [5 ] 6 32 Sign etend 6 control em [2 6] [5 ] Op u RegDst 43

Pipelining Summary Pipelining doesn t help latency of single task, it helps throughput of entire workload ultiple tasks operating simultaneously using different resources Potential speedup = Number pipe stages Pipeline rate limited by slowest pipeline stage Unbalanced lengths of pipe stages reduces speedup Time to fill pipeline and time to drain it reduces speedup Three types of pipeline hazards: structural,, and control/branch Stalling helps any kind of hazard hazard solutions: Stalling, forwarding, Hazard detection Control or Branch Hazard solutions: Stalling, Branch Prediction, Delayed Branching 44