Chapter 4. The Processor. Computer Architecture and IC Design Lab

Similar documents
Processor (I) - datapath & control. Hwansoo Han

Chapter 4. The Processor

Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Systems Architecture

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

Chapter 4. The Processor Designing the datapath

ECE232: Hardware Organization and Design

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

The MIPS Processor Datapath

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

CPU Organization (Design)

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

Chapter 4. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor

Introduction. Datapath Basics

LECTURE 5. Single-Cycle Datapath and Control

ECE260: Fundamentals of Computer Engineering

TDT4255 Computer Design. Lecture 4. Magnus Jahre. TDT4255 Computer Design

Chapter 4. The Processor

Inf2C - Computer Systems Lecture Processor Design Single Cycle

The Processor: Datapath & Control

Computer and Information Sciences College / Computer Science Department The Processor: Datapath and Control

361 datapath.1. Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

Single Cycle Datapath

COMPUTER ORGANIZATION AND DESIGN

CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath

CPE 335 Computer Organization. Basic MIPS Architecture Part I

The Big Picture: Where are We Now? EEM 486: Computer Architecture. Lecture 3. Designing a Single Cycle Datapath

Single Cycle Datapath

MIPS-Lite Single-Cycle Control

Outline. EEL-4713 Computer Architecture Designing a Single Cycle Datapath

Single Cycle CPU Design. Mehran Rezaei

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

361 control.1. EECS 361 Computer Architecture Lecture 9: Designing Single Cycle Control

Major CPU Design Steps

Introduction. Chapter 4. Instruction Execution. CPU Overview. University of the District of Columbia 30 September, Chapter 4 The Processor 1

COMP303 Computer Architecture Lecture 9. Single Cycle Control

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

CSEN 601: Computer System Architecture Summer 2014

ECE170 Computer Architecture. Single Cycle Control. Review: 3b: Add & Subtract. Review: 3e: Store Operations. Review: 3d: Load Operations

CENG 3420 Lecture 06: Datapath

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

COMPUTER ORGANIZATION AND DESIGN

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

CSE 141 Computer Architecture Summer Session Lecture 3 ALU Part 2 Single Cycle CPU Part 1. Pramod V. Argade

CS3350B Computer Architecture Winter Lecture 5.7: Single-Cycle CPU: Datapath Control (Part 2)

Review: Abstract Implementation View

CS 110 Computer Architecture Single-Cycle CPU Datapath & Control

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization

Computer Architecture, IFE CS and T&CS, 4 th sem. Single-Cycle Architecture

CS3350B Computer Architecture Quiz 3 March 15, 2018

Topic #6. Processor Design

ENGN1640: Design of Computing Systems Topic 04: Single-Cycle Processor Design

Laboratory 5 Processor Datapath

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

CS 61C: Great Ideas in Computer Architecture. MIPS CPU Datapath, Control Introduction

CS 61C: Great Ideas in Computer Architecture Datapath. Instructors: John Wawrzynek & Vladimir Stojanovic

ECE369. Chapter 5 ECE369

CC 311- Computer Architecture. The Processor - Control

ECE468 Computer Organization and Architecture. Designing a Single Cycle Datapath

Chapter 5: The Processor: Datapath and Control

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

CS61C : Machine Structures

Lecture 12: Single-Cycle Control Unit. Spring 2018 Jason Tang

CS359: Computer Architecture. The Processor (A) Yanyan Shen Department of Computer Science and Engineering

Adding Support for jal to Single Cycle Datapath (For More Practice Exercise 5.20)

COMPUTER ORGANIZATION AND DESIGN

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

Systems Architecture I

Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan)

Ch 5: Designing a Single Cycle Datapath

RISC Processor Design

CPE 335. Basic MIPS Architecture Part II

CS 351 Exam 2 Mon. 11/2/2015

CPU Design Steps. EECC550 - Shaaban

How to design a controller to produce signals to control the datapath

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

ENE 334 Microprocessors

EEM 486: Computer Architecture. Lecture 3. Designing Single Cycle Control

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA

Review. N-bit adder-subtractor done using N 1- bit adders with XOR gates on input. Lecture #19 Designing a Single-Cycle CPU

Instructions: MIPS ISA. Chapter 2 Instructions: Language of the Computer 1

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA MIPS ISA. In a CPU. (vonneumann) Processor Organization

Designing a Multicycle Processor

Working on the Pipeline

Data paths for MIPS instructions

Design of the MIPS Processor

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Single- Cycle CPU Datapath & Control Part 2

Chapter 1. Computer Abstractions and Technology. Lesson 3: Understanding Performance

ﻪﺘﻓﺮﺸﻴﭘ ﺮﺗﻮﻴﭙﻣﺎﻛ يرﺎﻤﻌﻣ MIPS يرﺎﻤﻌﻣ data path and ontrol control

Transcription:

Chapter 4 The Processor

Introduction CPU performance factors CPI Clock Cycle Time Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified version (Single cycle implementation) A more realistic pipelined version Multi cycle version is removed in this version) Implement simple subset, but shows most aspects Memory reference: lw, sw Arithmetic/logical: add, sub, and, or, slt Control transfer: beq, j CPU Time Instruction Count 4. Introduction 3

Review: MIPS Instruction Set Architecture (ISA) Instruction Categories Arithmetic Load/Store Jump and Branch Floating Point coprocessor Memory Management Special 3 Instruction Formats: all 32 bits wide Registers R R3 PC HI LO OP OP OP rs rt rd sa funct rs rt immediate jump target R format I format J format 4

Review: MIPS Register Convention Name Register Number Usage Preserve on call? $zero constant (hardware) n.a. $at reserved for assembler n.a. $v - $v 2-3 returned values no $a - $a3 4-7 arguments yes $t - $t7 8-5 temporaries no $s - $s7 6-23 saved values yes $t8 - $t9 24-25 temporaries no $gp 28 global pointer yes $sp 29 stack pointer yes $fp 3 frame pointer yes $ra 3 return addr (hardware) yes

Instruction Execution PC (Program counter) is used to fetch instruction in the instruction memory) After instruction is obtained, register numbers in instructions is used to read registers in register files. PC PC +4 for sequentially execution 6

Different actions for different instruction classes Use ALU to calculate Arithmetic result Memory address for load/store Branch target address Access data memory for load/store PC target address add $t, $s, $s2 lw $s, 2($s2) bne $t, $s5, Exit lw $s, 2($s2) j Loop 7

Multiplexers Can t just join wires together Use multiplexers 8

CPU Overview We need three Muxes to avoid problems Details of each Mux and Control will be introduced later 9

Logic Design Basics Information encoded in binary Low voltage =, High voltage = One wire per bit Multi bit data encoded on multi wire buses 4.2 Logic Design Conventions Combinational element (See next slide) Operate on data Output is a function of input State (sequential) elements 32 bit bus Output is a function of input and current states Store information

Review: Combinational Elements AND gate Y = A & B Adder Y = A + B A B + Y A B Multiplexer I I M u x S Y Y = S? I : I Y Arithmetic/Logic Unit Y = F(A, B) A ALU Y B F

Review: Sequential Elements Register: stores data in a circuit Uses a clock signal to determine when to update the stored value Edge triggered: update when Clk changes ( > or > ) The following figure is positive edge triggered: update when Clk changes from to D Clk Q Clk D Q 2

Review: Sequential Elements (with write enable) Register with write control Only updates on clock edge when write control input is Used when stored value is required later Clk D Write Clk Q Write D Q Q is not changed when Write= 3

Building a Datapath Datapath : Elements that process data and addresses in the CPU Registers, ALUs, mux s, memories, 4.3 Building a Datapath We will show how to build MIPS datapath 5

Instruction Fetch 32 bit register Increment by 4 for next instruction 6

Read two register operands Perform arithmetic/logical operation Write results into destination registers add $t, $s, $s2 R Format Instructions Instruction $s $s2 $t Read register Read register 2 Registers Write register Write data 32 32 Read data Read data 2 3 ALU operation Zero ALU ALU result RegWrite 7

Load/Store Instructions (need 4 components) Read register operands =>register files Calculate address using 6 bit offset Use ALU, but sign extend offset Load/store: read memory and update register, and write register value to memory Need data memory lw $t, 4($s3) #load word from memory sw $t, 8($s3) #store word to memory 2

Load/store Datapath: Load/Store Instruction Instruction Read register Read register 2 Registers Write register Write data RegWrite Read data Read data 2 3 ALU operation Zero ALU ALU result Address Write data MemWrite Data memory Read data 6 Sign 32 extend MemRead Datapath Two additional elements (sign extension and data memory) used To implement load/stores 2

Load e.g. lw $t, 4($s3) Animating the Datapath load op rs rt offset/immediate WD 5 5 RN RN2 WN RD Register File RegWrite 5 RD2 E T N D 6 32 RN: register number RN2: register number 2 WN: register number that will be written WD: write data 6 Operation 3 ALU lw rt, offset(rs) R[rt] <- MEM[R[rs] + s_extend(offset)]; 4 Zero MemWrite ADDR Memory WD MemRead RD 22

store Animating the Datapath store sw $t, 8($s3) op rs rt offset/immediate WD 5 5 RN RN2 WN RD Register File RegWrite 5 RD2 6 E T N D 32 6 Operation 3 ALU sw rt, offset(rs) MEM[R[rs] + sign_extend(offset)] <- R[rt] Zero MemWrite ADDR Memory WD MemRead RD 23

Review: Specifying Branch Destinations MIPS conditional branch instructions: op rs rt offset 6 bits 5 bits 5 bits 6 bits PC relative addressing Target address = PC + offset 4 PC already incremented by 4 by this time from the low order 6 bits of the branch instruction sign extend PC 32 6 offset 32 4 32 32 Add 32 32 Add 32 2 beq $s $t 2 24. 28 2C Target Address (address of next instruction) =? Each word is 4 bytes? 2C branch dst address 24

Read register operands Compare operands Use ALU, subtract and check Zero output Calculate target address Sign extend offset Shift left 2 bits (word displacement) Add to PC + 4 (already calculated by instruction fetch) Datapath: Branch Instructions See animation in the next slide 25

beq Animating the Datapath (beq) e.g. beq $s $t 2 op rs rt offset/immediate 6 PC +4 from instruction datapath ADD WD 5 5 RN RN2 WN RD Register File Operation ALU Zero <<2 RegWrite RD2 6 E T N D 32 beq rs, rt, offset if (R[rs] == R[rt]) then PC <- PC+4 + s_extend(offset<<2) 26

Sign extension and shift left by 2 hardware Simple hardware is used for sign extension and shift left by 2 msb lsb Shift left by 2 lsb Signed extension hardware 27 Sign bit wire replicated

Composing the Elements Make Data path do an instruction in one clock cycle Each datapath element can only do one function at a time Hence, we need separate instruction and data memories Use multiplexers where alternate data sources are used for different instructions 28

R Type/Load/Store Datapath A Single Cycle Datapath Correct Control signal (RegWrite, ALUSrc, ALU operation, MemWrite, MemtoReg, MemRead) are needed to make sure correct operation is done 29

Full Datapath (Single Cycle Datapath) 3

Control for the single cycle CPU Designing a processor A single cycle implementation (the datapath) Control for the single cycle CPU Control of CPU operations ALU controller Main controller 4.4 A Simple Implementation Scheme 3

Next: Building Datapath With Control

Main Control and ALU Control Main Control: Based on opcode: generate RegDst, Branch, MemRead MemtoReg, ALUOp MemWrite, ALUSrc, RegWrite ALU Control: Based on 2 bit ALUop and the 6 bit func field of instruction, the ALU control unit generates the 3 bit ALU control field 3 26 2 6 6 R-type op rs rt rd shamt funct func Op code 6 6 Main Control ALUop 2 ALU Control Unit ALU control 3 ALU 33

Deciding ALU Control Assume 2 bit ALUOp derived from opcode Combinational logic derives ALU control opcode ALUOp Operation funct ALU function ALU control lw load word add? sw store word add? beq branch equal subtract? R-type add add? subtract subtract? AND AND? OR OR? set-on-less-than set-on-less-than?

ALU Control ALU Control has 4 four bits: Ainvert, Binvert, and Operation (2 bits) Function ALU control AND OR add subtract set-on-lessthan NOR If NOR is not supported, we just need three bits (MSB bit is always

ALU used for ALU Control Load/Store: function = add Branch: function = subtract R type: function depends on funct field Assume 2 bit ALUOp derived from opcode Combinational logic derives ALU control opcode ALUOp Operation funct ALU function ALU control lw load word add sw store word add beq branch equal subtract R-type add add subtract subtract AND AND OR OR set-on-less-than set-on-less-than

Control l signal Hopefully, generate Inst. Memory Addr Deciding Main Control Signals Instruction<3:> Op <3:26> <2:25> Rs Rt <6:2> Rd <:5> Control <:5> Imm6 <:5> Funct PCsrc RegDst ALUSrc RegWr MemRd ALUctr (ALUOp) MemWr MemtoReg Equal Datapath 37

Review: The Main Control Unit Control signals derived from instruction R-type Load/ Store Branch rs rt rd shamt funct 3:26 25:2 2:6 5: :6 5: 35 or 43 rs rt offset 3:26 25:2 2:6 5: 4 rs rt offset 3:26 25:2 2:6 5: opcode always read read, except for load write for R type and load sign extend and add

Control Signals for R Type Instruction PC 4 ADDR ADD Instruction Memory RD immediate/ offset I[5:] Control signals shown in blue Instruction I 32 6 rs I[25:2] WD rt I[2:6] 5 5 RN RN2 WN RD Register File RegWrite rd I[5:] 5 MU 5 RD2 E 6 32 T N D RegDst M U ALUSrc <<2??? Operation 3 ALU ADD Value depends on funct Zero M U PCSrc MemWrite ADDR Data Memory WD MemRead RD MemtoReg M U 39

Control Signals: lw Instruction PC 4 ADDR ADD Instruction Memory RD immediate/ offset I[5:] Instruction I 32 6 rs I[25:2] WD rt I[2:6] 5 5 RN RN2 WN RD Register File RegWrite rd I[5:] 5 MU 5 RD2 E 6 32 T N D RegDst M U ALUSrc <<2 Operation 3 ALU ADD Zero M U PCSrc MemWrite ADDR Data Memory WD MemRead RD MemtoReg M U 4

Control Signals: sw Instruction PC 4 ADDR ADD Instruction Memory RD immediate/ offset I[5:] Instruction I 32 6 rs I[25:2] WD rt I[2:6] 5 5 RN RN2 WN RD Register File RegWrite rd I[5:] 5 MU 5 RD2 E 6 32 T N D RegDst M U ALUSrc <<2 Operation 3 ALU ADD Zero M U PCSrc MemWrite ADDR Data Memory WD MemRead RD MemtoReg M U 43

Control Signals: beq Instruction PC 4 ADDR ADD Instruction Memory RD immediate/ offset I[5:] Instruction I 32 6 rs I[25:2] WD rt I[2:6] 5 5 RN RN2 WN RD Register File RegWrite rd I[5:] 5 MU 5 RD2 E 6 32 T N D RegDst M U ALUSrc <<2 Operation 3 ALU ADD Zero M U PCSrc PCSrc= if Zero= MemWrite ADDR Data Memory WD MemRead RD MemtoReg M U 44

Review: Target address of Jump Assume PC=4 6, what is the target address of the jump instruction? 6 bits 26 bits Address in the instruction= x2 Target Address= PC[3:28]+2 6 *4= x484 46

Review: Implementing Jumps Jump 2 address 3:26 25: Jump uses word address Update PC with concatenation of Top 4 bits of old PC 26 bit jump address Need an extra control signal decoded from opcode 26 PC+4 Address (25:) 3:28 + New Target Address 47

Datapath Executing j j PC 4 ADDR ADD Instruction Memory RD jmpaddr I[25:] 26 32 op I[3: Control Unit 6 WD <<2 ALUOp 2 5 5 RN RN2 WN RD Register File RegWrite 28 ALU Control CONCAT op 6 I[3:26] funct 6 I[5:] Instruction I 5 MU 5 RD2 6 E T N D 32 RegDst 32 PC+4[3-28] M U ALUSrc <<2 Operation 3 ALU ADD Zero PCSrc M U M U Jump Branch MemWrite ADDR Data Memory WD MemRead RD MemtoReg M U 48

Truth Table for Main Control Signals Current design of control is for lw, sw, beq, and, or, add, sub, slt I format: lw, sw, beq R format: and, or, add, sub, slt Given 4 OP codes (each 6 bits) as inputs, the outputs are as follows => a main control logic (the next slide) inputs Instruction RegDst ALUSrc Memto Reg outputs Reg Write Mem Read See appendix D for details Mem Write Branch ALUOp ALUOp R format lw sw beq 5

Inputs Outputs Implementation of Main Control Block (Use PLA) Use PLA Signal R- lw sw beq name format Op5 Op4 Op3 Op2 Op Op RegDst x x ALUSrc MemtoReg x x RegWrite MemRead MemWrite Branch ALUOp ALUOP Truth table for main control signals Inputs Op5 Op4 Op3 Op2 Op Op R-format Iw sw beq Outputs RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp ALUOpO Main control PLA (programmable logic array) 5 4 3 2 ALUSrc=? 5

Truth Table for ALU control signals inputs outputs Merge LW & SW ALUOp Funct field Operation ALUOp ALUOp F5 F4 F3 F2 F F C 3 C 2 C C 52 add subtract add subtract and or slt

C3= Implementation of ALU Control Block C2 ALUOp= is able to identify branch instruction C C ALUOp ALUOp ALUOp ALU control block C2 ALUOp or ALUOp and F F3 C2 2 3 53 F(5 ) F2 F F C C Operation

Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not feasible to vary period for different instructions Violates design principle Making the common case fast We will improve performance by pipelining

Backup Slides 55