The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

Similar documents
Chapter 4. The Processor

Processor (I) - datapath & control. Hwansoo Han

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

Chapter 4. The Processor

Chapter 4. The Processor

Chapter 4. The Processor Designing the datapath

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

COMPUTER ORGANIZATION AND DESIGN

Systems Architecture

Chapter 4. The Processor. Computer Architecture and IC Design Lab

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

COMPUTER ORGANIZATION AND DESIGN

ECE232: Hardware Organization and Design

COMPUTER ORGANIZATION AND DESIGN

Introduction. Datapath Basics

The MIPS Processor Datapath

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

ECE260: Fundamentals of Computer Engineering

COMP2611: Computer Organization. The Pipelined Processor

LECTURE 3: THE PROCESSOR

TDT4255 Computer Design. Lecture 4. Magnus Jahre. TDT4255 Computer Design

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

Introduction. Chapter 4. Instruction Execution. CPU Overview. University of the District of Columbia 30 September, Chapter 4 The Processor 1

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

LECTURE 5. Single-Cycle Datapath and Control

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

Chapter 4. The Processor

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

CPE 335 Computer Organization. Basic MIPS Architecture Part I

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Processor (II) - pipelining. Hwansoo Han

Inf2C - Computer Systems Lecture Processor Design Single Cycle

ECS 154B Computer Architecture II Spring 2009

CPE 335 Computer Organization. Basic MIPS Pipelining Part I

Pipelining. CSC Friday, November 6, 2015

Full Datapath. Chapter 4 The Processor 2

Chapter 4. The Processor

ECE331: Hardware Organization and Design

Single Cycle Datapath

Single Cycle Datapath

Lecture 7 Pipelining. Peng Liu.

Chapter 4. The Processor

ENGN1640: Design of Computing Systems Topic 04: Single-Cycle Processor Design

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

EIE/ENE 334 Microprocessors

Computer Architecture. Lecture 6.1: Fundamentals of

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

Review: Abstract Implementation View

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

ECE260: Fundamentals of Computer Engineering

CENG 3420 Lecture 06: Datapath

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

Full Datapath. Chapter 4 The Processor 2

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

CPU Organization (Design)

Chapter 5: The Processor: Datapath and Control

Chapter 4. The Processor

The Processor: Datapath & Control

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

CC 311- Computer Architecture. The Processor - Control

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

CPE 335. Basic MIPS Architecture Part II

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition. Chapter 4. The Processor

COSC 6385 Computer Architecture - Pipelining

Topic #6. Processor Design

Lecture 6 Datapath and Controller

CENG 3420 Lecture 06: Pipeline

Computer Organization and Structure

RISC Processor Design

CSEN 601: Computer System Architecture Summer 2014

Systems Architecture I

Chapter 4. The Processor. Jiang Jiang

Major CPU Design Steps

The Big Picture: Where are We Now? EEM 486: Computer Architecture. Lecture 3. Designing a Single Cycle Datapath

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

Lecture 10: Simple Data Path

Pipeline design. Mehran Rezaei

Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan)

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

CSEE 3827: Fundamentals of Computer Systems

The overall datapath for RT, lw,sw beq instrucution

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization

Improving Performance: Pipelining

Transcription:

The Processor Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut CSE3666: Introduction to Computer Architecture

Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware Two MIPS implementations A simplified version A more realistic pipelined version Simple subset, shows most aspects Memory reference: lw, sw Arithmetic/logical: add, sub, and, or, slt Control transfer: beq, j

Instruction Execution PC instruction memory, fetch instruction Register numbers register file, read registers Depending on instruction class Use ALU to calculate Arithmetic result Memory address for load/store Branch target address Access data memory for load/store PC target address or PC + 4

CPU Overview

Multiplexers Can t just join wires together Use multiplexers

Control Chapter 4 The Processor 6

Logic Design Basics Information encoded in binary Low voltage = 0, High voltage = 1 One wire per bit Multi-bit data encoded on multi-wire buses Combinational element Operate on data Output is a function of input State (sequential) elements Store information

Combinational Elements AND-gate Y = A & B Adder Y = A + B A B + Y A B I0 I1 M u x S Y Multiplexer Y = S? I1 : I0 Y Arithmetic/Logic Unit Y = F(A, B) A ALU Y B F Chapter 4 The Processor 8

Sequential Elements Register: stores data in a circuit Uses a clock signal to determine when to update the stored value Edge-triggered: update when Clk changes from 0 to 1 D Clk Q Clk D Q

Sequential Elements Register with write control Only updates on clock edge when write control input is 1 Used when stored value is required later Clk D Write Clk Q Write D Q

Clocking Methodology Combinational logic transforms data during clock cycles Between clock edges Input from state elements, output to state element Longest delay determines clock period

Building a Datapath Datapath Elements that process data and addresses in the CPU Registers, ALUs, mux s, memories, We will build a MIPS datapath incrementally Refining the overview design

Instruction Fetch 32-bit register Increment by 4 for next instruction

R-Format Instructions Read two register operands Perform arithmetic/logical operation Write register result

Load/Store Instructions Read register operands Calculate address using 16-bit offset Use ALU, but sign-extend offset Load: Read memory and update register Store: Write register value to memory

Branch Instructions Read register operands Compare operands Use ALU, subtract and check Zero output Calculate target address Sign-extend displacement Shift left 2 places (word displacement) Add to PC + 4 Already calculated by instruction fetch

Branch Instructions Just re-routes wires Sign-bit wire replicated

R-Type/Load/Store Datapath

Composing the Elements First-cut data path does an instruction in one clock cycle Each datapath element can only do one function at a time Hence, we need separate instruction and data memories Use multiplexers where alternate data sources are used for different instructions

Full Datapath

ALU Control ALU used for Load/Store: F = add Branch: F = subtract R-type: F depends on funct field ALU control Function 0000 AND 0001 OR 0010 add 0110 subtract 0111 set-on-less-than 1100 NOR

ALU Control Assume 2-bit ALUOp derived from opcode Combinational logic derives ALU control opcode ALUOp Operation funct ALU function ALU control lw 00 load word XXXXXX add 0010 sw 00 store word XXXXXX add 0010 beq 01 branch equal XXXXXX subtract 0110 R-type 10 add 100000 add 0010 subtract 100010 subtract 0110 AND 100100 AND 0000 OR 100101 OR 0001 set-on-less-than 101010 set-on-less-than 0111

The Main Control Unit Control signals derived from instruction R-type Load/ Store Branch 0 rs rt rd shamt funct 31:26 25:21 20:16 15:11 10:6 5:0 35 or 43 rs rt address 31:26 25:21 20:16 15:0 4 rs rt address 31:26 25:21 20:16 15:0 opcode always read read, except for load write for R- type and load sign-extend and add

Datapath With Control Chapter 4 The Processor 24

R-Type Instruction

Load Instruction

Branch-on-Equal Instruction

Implementing Jumps Jump 2 address 31:26 25:0 Jump uses word address Update PC with concatenation of Top 4 bits of old PC 26-bit jump address 00 Need an extra control signal decoded from opcode

Datapath With Jumps Added

Five stages in MIPS instruction execution 1. IF: Instruction fetch from memory 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register

Performance Assume time for stages is 100ps for register read or write 200ps for other stages Instr Instr fetch Register read ALU op Memory access Register write Total time lw 200ps 100 ps 200ps 200ps 100 ps 800ps sw 200ps 100 ps 200ps 200ps 700ps R-format 200ps 100 ps 200ps 100 ps 600ps beq 200ps 100 ps 200ps 500ps

Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not feasible to vary period for different instructions Violates design principle Making the common case fast We will improve performance by pipelining

Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup = 2n/0.5n + 1.5 4 = number of stages

MIPS Pipeline Five stages, one step per stage 1. IF: Instruction fetch from memory 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register

Pipeline Performance Single-cycle (Tc= 800ps) Pipelined (Tc= 200ps)

Pipeline Speedup If all stages are balanced i.e., all take the same time Time _ Between _ Instructions pipelined Time _ Between _ Instructions Number _ of _ stages nonpipelined If not balanced, speedup is less Speedup due to increased throughput Latency (time for each instruction) does not decrease

Pipelining and ISA Design MIPS ISA designed for pipelining All instructions are 32-bits Easier to fetch and decode in one cycle c.f. x86: 1- to 17-byte instructions Few and regular instruction formats Can decode and read registers in one step Load/store addressing Can calculate address in 3 rd stage, access memory in 4 th stage Alignment of memory operands Memory access takes only one cycle

Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards A required resource is busy Data hazard Need to wait for previous instruction to complete its data read/write Control hazard Deciding on control action depends on previous instruction

MIPS Pipelined Datapath MEM Right-to-left flow leads to hazards WB

Pipeline Registers State registers between each pipeline stage to isolate them Hold information produced in previous cycles

PC MIPS Pipeline Datapath Additions/Mods IF:IFetch ID:Dec EX:Execute MEM: MemAccess WB: WriteBack IF/ID ID/EX EX/MEM Add 4 Instruction Memory Read Address Read Addr 1 Register Read Read Addr 2 Data 1 File Write Addr Write Data Read Data 2 Shift left 2 Add ALU Address Write Data Data Memory Read Data MEM/WB Sign 16 Extend 32 System Clock

Pipelined Control Control signals derived from instruction As in single-cycle implementation All control signals can be determined during decode And held in the state registers

PC MIPS Pipeline Control Path Modifications PCSrc ID/EX EX/MEM IF/ID Control 4 Instruction Memory Read Address Add RegWrite Read Addr 1 Register Read Read Addr 2 Data 1 File Write Addr Write Data Read Data 2 Sign 16 Extend 32 Shift left 2 ALUSrc Add ALU ALU cntrl ALUOp Branch Address Write Data Data Memory Read Data MemRead MEM/WB MemtoReg RegDst

Pipeline Control IF Stage: read Instr Memory (always asserted) and write PC (on System Clock) ID Stage: no optional control signals to set Reg Dst EX Stage MEM Stage WB Stage ALU Op1 ALU Op0 ALU Src Brch Mem Read Mem Write Reg Write Mem toreg R 1 1 0 0 0 0 0 1 0 lw 0 0 0 1 0 1 0 1 1 sw X 0 0 1 0 0 1 0 X beq X 0 1 0 1 0 0 0 X

Graphically Representing MIPS Pipeline ALU IM Reg DM Reg Can help with answering questions like: How many cycles does it take to execute this code? What is the ALU doing during cycle 4? Is there a hazard, why does it occur, and how can it be fixed?

Pipelined Execution Time (clock cycles) I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 ALU IM Reg DM Reg ALU IM Reg DM Reg ALU IM Reg DM Reg ALU IM Reg DM Reg Once the pipeline is full, one instruction is completed every cycle, so CPI = 1 Inst 4 Time to fill the pipeline ALU IM Reg DM Reg

Pipeline Operation Cycle-by-cycle flow of instructions through the pipelined datapath Single-clock-cycle pipeline diagram Shows pipeline usage in a single cycle Highlight resources used Multi-clock-cycle diagram Graph of operation over time

Multi-Cycle Pipeline Diagram Form showing resource usage

Multi-Cycle Pipeline Diagram Traditional form

Single-Cycle Pipeline Diagram State of pipeline in a given cycle