ECE 250 / CPS 250 Computer Architecture. Processor Design Datapath and Control

Similar documents
Chapter 4. The Processor

Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

ECE260: Fundamentals of Computer Engineering

COMPUTER ORGANIZATION AND DESIGN

Processor (I) - datapath & control. Hwansoo Han

Chapter 4. The Processor. Computer Architecture and IC Design Lab

The MIPS Processor Datapath

Implementing the Control. Simple Questions

Chapter 4. The Processor Designing the datapath

Review: Abstract Implementation View

Systems Architecture

Initial Representation Finite State Diagram. Logic Representation Logic Equations

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

CS3350B Computer Architecture Winter 2015

CPS104 Computer Organization and Programming Lecture 17: Interrupts and Exceptions. Interrupts Exceptions and Traps. Visualizing an Interrupt

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Instruction Set Architecture of MIPS Processor

LECTURE 3: THE PROCESSOR

LECTURE 5. Single-Cycle Datapath and Control

Microprogramming. Microprogramming

Chapter 4. The Processor

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

RISC Processor Design

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

ENE 334 Microprocessors

TDT4255 Computer Design. Lecture 4. Magnus Jahre. TDT4255 Computer Design

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Where We Are in This Course Right Now. ECE 152 Introduction to Computer Architecture. This Unit: Caches and Memory Hierarchies.

ECE232: Hardware Organization and Design

What are Exceptions? EE 457 Unit 8. Exception Processing. Exception Examples 1. Exceptions What Happens When Things Go Wrong

EE 457 Unit 8. Exceptions What Happens When Things Go Wrong

Initial Representation Finite State Diagram Microprogram. Sequencing Control Explicit Next State Microprogram counter

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture

Chapter 4. The Processor

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

MIPS PROJECT INSTRUCTION SET and FORMAT

CS 61C: Great Ideas in Computer Architecture. MIPS CPU Datapath, Control Introduction

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

CPU Organization (Design)

Computer Architecture

CS 61C: Great Ideas in Computer Architecture Datapath. Instructors: John Wawrzynek & Vladimir Stojanovic

CENG 3420 Lecture 06: Datapath

CS Computer Architecture Spring Week 10: Chapter

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

COMPUTER ORGANIZATION AND DESI

Introduction. Datapath Basics

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Single Cycle Datapath

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA MIPS ISA. In a CPU. (vonneumann) Processor Organization

Homework #4 Processor Core Design

ﻪﺘﻓﺮﺸﻴﭘ ﺮﺗﻮﻴﭙﻣﺎﻛ يرﺎﻤﻌﻣ MIPS يرﺎﻤﻌﻣ data path and ontrol control

Stored Program Concept. Instructions: Characteristics of Instruction Set. Architecture Specification. Example of multiple operands

ECE369. Chapter 5 ECE369

Instruction Set Architecture part 1 (Introduction) Mehran Rezaei

ICS 233 COMPUTER ARCHITECTURE. MIPS Processor Design Multicycle Implementation

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA

Chapter 5: The Processor: Datapath and Control

This Unit: Processor Design. What Is Control? Example: Control for sw. Example: Control for add

CPE 335. Basic MIPS Architecture Part II

Digital System Design Using Verilog. - Processing Unit Design

Format. 10 multiple choice 8 points each. 1 short answer 20 points. Same basic principals as the midterm

EECE 417 Computer Systems Architecture

Single Cycle Datapath

Part II Instruction-Set Architecture. Jan Computer Architecture, Instruction-Set Architecture Slide 1

Chapter 4. The Processor

ECE 486/586. Computer Architecture. Lecture # 7

Chapter 5 Solutions: For More Practice

CPE 335 Computer Organization. Basic MIPS Architecture Part I

Computer Organization

Inf2C - Computer Systems Lecture Processor Design Single Cycle

Topic Notes: MIPS Instruction Set Architecture

Math 230 Assembly Programming (AKA Computer Organization) Spring MIPS Intro

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

Block diagram view. Datapath = functional units + registers

CS222: Processor Design

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Chapter 4. The Processor

Computer Science 61C Spring Friedland and Weaver. The MIPS Datapath

ENGN1640: Design of Computing Systems Topic 03: Instruction Set Architecture Design

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

CS3350B Computer Architecture Quiz 3 March 15, 2018

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4. The Processor

Clever Signed Adder/Subtractor. Five Components of a Computer. The CPU. Stages of the Datapath (1/5) Stages of the Datapath : Overview

Processor. Han Wang CS3410, Spring 2012 Computer Science Cornell University. See P&H Chapter , 4.1 4

EIE/ENE 334 Microprocessors

Instructions: Language of the Computer

CC 311- Computer Architecture. The Processor - Control

Full Datapath. Chapter 4 The Processor 2

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Transcription:

ECE 250 / CPS 250 Computer Architecture Processor Design Datapath and Control Benjamin Lee Slides based on those from Andrew Hilton (Duke), Alvy Lebeck (Duke) Benjamin Lee (Duke), and Amir Roth (Penn)

Where We Are in This Course Right Now So far: We know what a computer architecture is We know what kinds of instructions it might execute We know how to perform arithmetic and logic in an ALU Now: We learn how to design a processor in which the ALU is just one component Processor must be able to fetch instructions, decode them, and execute them There are many ways to do this, even for a given ISA Next: We learn how to design memory systems from Roth ECE250 2

This Unit: Processor Design Application OS Compiler Firmware CPU I/O Memory Digital Circuits Datapath components and timing Registers and register files Memories (RAMs) Mapping an ISA to a datapath Control Exceptions Gates & Transistors from Roth ECE250 3

Readings Patterson and Hennessy Chapter 4: Sections 4.1-4.4 Read this chapter carefully It has many more examples than I can cover in class from Roth ECE250 4

So You Have an ALU Important reminder: a processor is just a big finite state machine (FSM) that interprets some ISA Start with one instruction add $3,$2,$4 ALU performs just a small part of execution of instruction You have to read and write registers You have have to fetch the instruction to begin with What about loads and stores? Need some sort of memory interface What about branches? Need some hardware for that, too from Roth ECE250 5

Datapath and Control datapath fetch PC Insn memory Register File Data Memory control Fetch: get instruction, translate into control Control: which registers read/write, which ALU operation Datapath: registers, memories, ALUs (computation) Processor Cycle: Fetch Decode Execute from Roth ECE250 6

Building a Processor for an ISA Fetch is pretty straightforward Just need a register (called the Program Counter or PC) to hold the next address to fetch from instruction memory Provide address to instruction memory! instruction memory provides instruction at that address Let s start with the datapath 1. Look at ISA 2. Make sure datapath can implement every instruction from Roth ECE250 7

Datapath for MIPS ISA Consider only the following instructions add $1,$2,$3 addi $1,2,$3 lw $1,4($3) sw $1,4($3) beq $1,$2,PC_relative_target j Absolute_target Why only these? Most other instructions are similar from datapath viewpoint I leave the ones that aren t for you to figure out from Roth ECE250 8

Review: A Register D 0 DFF Q 0 D 1 DFF Q 1 D N N Q D N-1 WE CLK DFF Q N-1 WE Register: DFF array with shared clock, write-enable (WE) Notice: both a clock and a WE (DFF WE = clock & register WE ) Convention I: clock represented by wedge Convention II: if no WE, DFF is written on every clock from Roth ECE250 9

Uses of Registers: Program Counter datapath fetch PC Insn memory Register File Data Memory A single register is good for some things PC: program counter control Other things which aren t the ISA registers (more later in semester) from Roth ECE250 10

Uses of Registers: Architected Registers RDVAL RS1VAL Register File RS2VAL RD = dest reg WE RD RS1 RS2 RS = source reg Register file: the ISA ( architectural, visible ) registers Two read ports + one write port Maximum number of reads/writes in single instruction (R-type) Port: wires for accessing an array of data Data bus: width of data element (MIPS: 32 bits) Address bus: width of log 2 number of elements (MIPS: 5 bits) Write enable: if it s a write port M ports = M parallel and independent accesses from Roth ECE250 11

A Register File With Four Registers from Roth ECE250 12

Add a Read Port for RS1 RS1VAL Output of each register into 4to1 mux (RS1VAL) RS1 is select input of RS1VAL mux RS1 from Roth ECE250 13

Add Another Read Port for RS2 RS2VAL RS1VAL RS2 RS1 Output of each register into another 4to1 mux (RS2VAL) RS2 is select input of RS2VAL mux from Roth ECE250 14

Add a Write Port for RD 2-to-1 decoder RDVAL RS2VAL RS1VAL WE RD Input RDVAL into each register Enable only one register s WE: (Decoded RD) & (WE) What if we needed two write ports? RS2 RS1 from Roth ECE250 15

Another Read Port Implementation A read port that uses muxes is fine for 4 registers Not so good for 32 registers (32-to-1 mux is very slow) Alternative implementation uses tri-state buffers Truth table (E = enable, D = input, Q = output) D Q E D Q 1 D D E 0 D Z Z: high impedance state, no current flowing Mux: connect multiple tri-stated buses to one output bus Key: only one input driving at any time, all others must be in Z Else, all hell breaks loose (electrically) from Roth ECE250 16

Register File With Tri-State Read Ports RDVAL RS1VAL RS2VAL WE RD RS1 RS2 from Roth ECE250 17

Another Useful Component: Memory DATAIN DATAOUT ADDRESS Memory WE Memory: where instructions and data reside One address bus One input data bus for writes One output data bus for reads Actually, a more traditional definition of memory is One input/output data bus No clock! asynchronous strobe instead from Roth ECE250 18

Let s Build A MIPS-like Datapath from Roth ECE250 19

Start With Fetch + 4 P C Insn Mem PC and instruction memory A +4 incrementer computes default next instruction PC Why +4 (and not +1)? from Roth ECE250 20

First Instruction: add $rd, $rs, $rt + 4 P C Insn Mem Register File rs rt rs + rt s1 s2 d WE R-type Op(6) rs(5) rt(5) rd(5) Sh(5) Func(6) Add register file and ALU from Roth ECE250 21

Second Instruction: addi $rt, $rs, imm + 4 sign extension (sx) unit P C Insn Mem Register File rs s1 s2 d S X Extended(imm) I-type Op(6) rs(5) rt(5) Immed(16) Destination register can now be either rd or rt Add sign extension unit and mux into second ALU input from Roth ECE250 22

Third Instruction: lw $rt, imm($rs) + 4 P C Insn Mem Register File s1 s2 d??? a Data d Mem?? S X? I-type Op(6) rs(5) rt(5) Immed(16) Add data memory, address is ALU output (rs+imm) Add register write data mux to select memory output or ALU output from Roth ECE250 23

Fourth Instruction: sw $rt, imm($rs) + 4 P C Insn Mem Register File s1 s2 d? a Data d Mem S X I-type Op(6) rs(5) rt(5) Immed(16) Add path from second input register to data memory data input Disable RegFile s WE signal from Roth ECE250 24

Fifth Instruction: beq $1,$2,target + 4 << 2 P C Insn Mem Register File s1 s2 d z a Data d Mem S X I-type Op(6) rs(5) rt(5) Immed(16) Add left shift unit (why?) and adder to compute PC-relative branch target Add mux to do what? from Roth ECE250 25

Sixth Instruction: j + 4 << 2 << 2 P C Insn Mem Register File s1 s2 d a Data d Mem S X J-type Op(6) Immed(26) Add shifter to compute left shift of 26-bit immediate Add additional PC input mux for jump target from Roth ECE250 26

Seventh, Eight, Ninth Instructions Are these the paths we would need for all instructions? sll $1,$2,4 // shift left logical Like an arithmetic operation, but need a shifter too slt $1,$2,$3 // set less than (slt) Like subtract, but need to write the condition bits, not the result Need zero extension unit for condition bits Need additional input to register write data mux jal absolute_target // jump and link Like a jump, but also need to write PC+4 into $ra ($31) Need path from PC+4 adder to register write data mux Need to be able to specify $31 as an implicit destination jr $31 // jump register Like a jump, but need path from register read to PC write mux from Roth ECE250 27

Clock Timing Must deliver clock(s) to avoid races Can t write and read same value at same clock edge Particularly a problem for RegFile and Memory May create multiple clock edges (from single input clock) by using buffers (to delay clock) and inverters from Roth ECE250 28

This Unit: Processor Design Application OS Compiler Firmware CPU I/O Memory Digital Circuits Gates & Transistors Datapath components and timing Registers and register files Memories (RAMs) Clocking strategies Mapping an ISA to a datapath Control Exceptions from Roth ECE250 29

What Is Control? + 4 << 2 << 2 BR JP P C Insn Mem Register File s1 s2 d a Data d Mem Rwd Rwe S X ALUop DMwe Rdst ALUinB 9 signals control flow of data through this datapath MUX selectors, or register/memory write enable signals Datapath of current microprocessor has 100s of control signals from Roth ECE250 30

Example: Control for add + 4 << 2 << 2 BR=0 JP=0 P C Insn Mem Register File s1 s2 d a Data d Mem Rwd=0 Rwe=1 S X ALUop=0 DMwe=0 Rdst=1 ALUinB=0 from Roth ECE250 31

Example: Control for sw + 4 << 2 << 2 BR=0 JP=0 P C Insn Mem Register File s1 s2 d a Data d Mem Rwd=X Rwe=0 S X ALUop=0 DMwe=1 Rdst=X ALUinB=1 Difference between a sw and an add is 5 signals 3 if you don t count the X ( don t care ) signals from Roth ECE250 32

Example: Control for beq $1,$2,target + 4 << 2 << 2 BR=1 JP=0 P C Insn Mem Register File s1 s2 d a Data d Mem Rwd=X Rwe=0 S X ALUop=1 DMwe=0 Rdst=X ALUinB=0 Difference between a store and a branch is only 4 signals from Roth ECE250 33

How Is Control Implemented? + 4 << 2 << 2 BR JP P C Insn Mem Register File s1 s2 d a Data d Mem Rwd Rwe S X ALUop DMwe Rdst ALUinB Control? from Roth ECE250 34

Implementing Control Each instruction has a unique set of control signals Most signals are function of opcode Some may be encoded in the instruction itself E.g., the ALUop signal is some portion of the MIPS Func field + Simplifies controller implementation Requires careful ISA design Options for implementing control 1. Use instruction type to look up control signals in a table 2. Design FSM whose outputs are control signals Either way, goal is same: turn instruction into control signals from Roth ECE250 35

Control Implementation: ROM ROM (read only memory): like a RAM but unwritable Bits in data words are control signals Lines indexed by opcode Example: ROM control for our simple datapath BR JP ALUinB ALUop DMwe Rwe Rdst Rwd add 0 0 0 0 0 1 1 0 opcode addi 0 0 1 0 0 1 1 0 lw 0 0 1 0 0 1 0 1 sw 0 0 1 0 1 0 0 0 beq 1 0 0 1 0 0 0 0 j 0 1 0 0 0 0 0 0 from Roth ECE250 36

ROM vs. Combinational Logic A control ROM is fine for 6 insns and 9 control signals A real machine has 100+ insns and 300+ control signals Even RISC s have lots of instructions 30,000+ control bits (~4KB) Not huge, but hard to make fast Control must be faster than datapath Alternative: combinational logic Exploits observation: many signals have few 1s or few 0s from Roth ECE250 37

Control Implementation: Combinational Logic Example: combinational logic control for our simple datapath opcode add addi lw sw beq j BR JP DMwe Rwe Rwd Rdst ALUop ALUinB from Roth ECE250 38

Datapath and Control Timing + 4 P C Insn Mem Register File s1 s2 d a Data d Mem S X Control (ROM or combinational logic) Read IMem Read Registers (Read Control ROM) Read DMEM Write DMEM Write Registers Write PC from Roth ECE250 39

Single-Cycle Performance Useful metric: cycles per instruction (CPI) + Easy to calculate for single-cycle processor: CPI = 1 Seconds/program = (insns/program) * 1 CPI * (N seconds/cycle) ICQ: How many cycles/second in 3.8 GHz processor? Slow! Clock period must be elongated to accommodate longest operation In our datapath: lw Goes through five structures in series: insn mem, register file (read), ALU, data mem, register file again (write) No one will buy a machine with a slow clock Later: faster processor cores from Roth ECE250 40

This Unit: Processor Design Application OS Compiler Firmware CPU I/O Memory Digital Circuits Gates & Transistors Datapath components and timing Registers and register files Memories (RAMs) Clocking strategies Mapping an ISA to a datapath Control Exceptions from Roth ECE250 41

Program Execution Threads of Control Multiple threads, programs run within a system Each thread, program has its own program counter Program Execution Fetch instruction from memory with address in PC Execute instruction Increment PC Begin PC at known location after system start-up Load the operating system kernel from Roth ECE250 42

Execution Context Program context defined by processor state General purpose registers (integer, floating-point) Status registers (e.g., condition codes) Program counter, stack pointer Memory hierarchy from Roth ECE250 43

Context Switches Context switches (Motivation) Allows programs to share machine, increases machine utilization OS schedules, switches between multiple programs Permits different execution modes (e.g., user versus OS kernel) Context switches (Mechanism) Save current context, restore next context Context switches (Triggers) User <-> User: Timeslice for multiple programs (e.g., 10-100ms) User <-> OS: Invoke OS to handle external events (e.g., I/O) from Roth ECE250 44

Interrupts and Exceptions Exception is external event requiring response Clock interrupts for time-slicing, context switches I/O operations for network, disk, keyboard, etc. Exception, interrupt, trap are used interchangeably Exceptions are infrequent Input/Output, illegal instruction, divide-by-zero, page fault, protection fault, ctrl-c, ctrl-z, timer Exception handling requires OS OS kernel includes instructions for exception handling Exception handling is transparent to application code Handlers fix & restart (e.g., I/O), or terminate (e.g., illegal insn) from Roth ECE250 45

Detecting Exceptions Undefined Instruction Detect unknown opcodes Arithmetic Exceptions Add logic in ALU to detect overflow Add logic in divider to detect divide-by-zero Unaligned Access Add circuit to check addresses Word-aligned addresses have {00} in the least significant bits from Roth ECE250 46

Handling Exceptions Identify interrupt s cause Invoke OS routine i = ID for interrupt s cause PC = interrupt_table[i] Kernel initializes table @ boot Clear interrupt signal Return from interrupt handler Return to user context from Roth ECE250 47

Handling System Calls Special instruction invokes OS Read or write I/O devices Create new process Invoke OS routine i = ID for syscall PC = interrupt_table[i] Kernel initializes table @ boot Clear interrupt signal Return from interrupt handler Return to user context from Roth ECE250 48

Exceptions as Procedure Calls 1. Processor saves address of user instruction Address of instruction stored in Exception Program Counter (EPC) 2. Processor transfers control to OS Set PC to address of exception handler within OS code 3. OS executes handler, which resolves exception 4. OS returns to user program (EPC), or terminates program from Roth ECE250 49

Exceptions and Register Support 1. Exception Program Counter (EPC) 32-bit register holds address of affected instruction 2. Cause Register 32-bit register encodes exception cause 3. BadVAddr 32-bit register holds address that triggers memory access exception See memory hierarchy, virtual memory for detail 4. Status 32-bit register tracks interrupt handling, multiple interrupts, etc. from Roth ECE250 50

Exception Handling What does exception handling look like to software? When exception happens Control transfers to OS at pre-specified exception handler address OS has privileged access to registers user processes do not see These registers hold information about exception Cause of exception (e.g., page fault, arithmetic overflow) Other exception info (e.g., address that caused page fault) PC of application insn to return to after exception is fixed OS uses privileged (and non-privileged) registers to do its thing OS returns control to user application Same mechanism available programmatically via SYSCALL from Roth ECE250 51

MIPS Exception Handling MIPS uses registers for state during exception handling These registers live on coprocessor 0 $14: EPC (holds PC of user program during exception handling) $13: exception type (SYSCALL, overflow, etc.) $8: virtual address (produced page/protection fault) $12: exception mask (which exceptions trigger OS) Access registers with privileged instructions mfc0, mtc0 Privileged = user program cannot execute them mfc0: move (register) from coprocessor 0 (to user reg) mtc0: move (register) to coprocessor 0 (from user reg) Restore user mode with privileged instruction rfe Kernel executes this instruction to restore user program from Roth ECE250 52

Implementing Exceptions Why do architects care about exceptions? Because we use datapath and control to implement them More precisely to implement aspects of exception handling Recognition of exceptions Transfer of control to OS Privileged OS mode Later in semester, we ll talk more about exceptions (b/c we need them for I/O) from Roth ECE250 53

Datapath with Support for Exceptions PSRs P S R PSRr + 4 CRwd CRwe Co-procesor Register File << 2 << 2 PCwC P C Insn Mem I R Register File s1 s2 d B A ALUinAC O a Data d Mem D S X Co-processor register (CR) file needn t be implemented as RF Independent registers connected directly to pertinent muxes PSR (processor status register): in privileged mode? from Roth ECE250 54

This Unit: Processor Design Application OS Compiler Firmware CPU I/O Memory Digital Circuits Datapath components and timing Registers and register files Memories (RAMs) Clocking strategies Mapping an ISA to a datapath Control Gates & Transistors Next up: Memory Systems from Roth ECE250 55

Summary We now know how to build a fully functional processor But We re still treating memory as a black box Our fully functional processor is slow. Really, really slow. from Roth ECE250 56