EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

Similar documents
EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

ENGN1640: Design of Computing Systems Topic 04: Single-Cycle Processor Design

Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan)

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

Processor (I) - datapath & control. Hwansoo Han

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

CS 61C: Great Ideas in Computer Architecture Datapath. Instructors: John Wawrzynek & Vladimir Stojanovic

CS61C : Machine Structures

Review. N-bit adder-subtractor done using N 1- bit adders with XOR gates on input. Lecture #19 Designing a Single-Cycle CPU

Systems Architecture

Fundamentals of Computer Systems

UC Berkeley CS61C : Machine Structures

The Big Picture: Where are We Now? EEM 486: Computer Architecture. Lecture 3. Designing a Single Cycle Datapath

361 datapath.1. Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath

ECE232: Hardware Organization and Design

inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 18 CPU Design: The Single-Cycle I ! Nasty new windows vulnerability!

Computer Science 61C Spring Friedland and Weaver. The MIPS Datapath

Lecture 8: Control COS / ELE 375. Computer Architecture and Organization. Princeton University Fall Prof. David August

Adding Support for jal to Single Cycle Datapath (For More Practice Exercise 5.20)

Single Cycle CPU Design. Mehran Rezaei

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

Chapter 4. The Processor. Computer Architecture and IC Design Lab

CPU Organization (Design)

Outline. EEL-4713 Computer Architecture Designing a Single Cycle Datapath

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 28: Single- Cycle CPU Datapath Control Part 1

Chapter 4. The Processor

The Processor: Datapath & Control

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

Chapter 4. The Processor

LECTURE 5. Single-Cycle Datapath and Control

CS 61C: Great Ideas in Computer Architecture. MIPS CPU Datapath, Control Introduction

COMP303 Computer Architecture Lecture 9. Single Cycle Control

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

CPE 335 Computer Organization. Basic MIPS Architecture Part I

The MIPS Processor Datapath

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

ECE170 Computer Architecture. Single Cycle Control. Review: 3b: Add & Subtract. Review: 3e: Store Operations. Review: 3d: Load Operations

CS3350B Computer Architecture Quiz 3 March 15, 2018

CENG 3420 Lecture 06: Datapath

MIPS-Lite Single-Cycle Control

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

CS 110 Computer Architecture Single-Cycle CPU Datapath & Control

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

CS61C : Machine Structures

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

CS3350B Computer Architecture Winter Lecture 5.7: Single-Cycle CPU: Datapath Control (Part 2)

Major CPU Design Steps

361 control.1. EECS 361 Computer Architecture Lecture 9: Designing Single Cycle Control

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

CSEN 601: Computer System Architecture Summer 2014

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

EEM 486: Computer Architecture. Lecture 3. Designing Single Cycle Control

Chapter 4. The Processor Designing the datapath

Ch 5: Designing a Single Cycle Datapath

CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath

Working on the Pipeline

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

CS61C : Machine Structures

Review: Abstract Implementation View

Systems Architecture I

COMP2611: Computer Organization. The Pipelined Processor

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

RISC Processor Design

ENGN1640: Design of Computing Systems Topic 04: Single-Cycle Processor Design

ECS 154B Computer Architecture II Spring 2009

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization

Computer Hardware Engineering

Design of the MIPS Processor

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

Chapter 5 Solutions: For More Practice

Inf2C - Computer Systems Lecture Processor Design Single Cycle

CPE 335. Basic MIPS Architecture Part II

Chapter 4 The Processor 1. Chapter 4A. The Processor

Laboratory 5 Processor Datapath

CSE 378 Midterm 2/12/10 Sample Solution

Lecture 7 Pipelining. Peng Liu.

ECE369. Chapter 5 ECE369

Introduction. Datapath Basics

Lecture #17: CPU Design II Control

Processor: Multi- Cycle Datapath & Control

CS3350B Computer Architecture Winter 2015

Inf2C - Computer Systems Lecture 12 Processor Design Multi-Cycle

ECE468 Computer Organization and Architecture. Designing a Single Cycle Datapath

Lecture 10: Simple Data Path

Lecture 5 and 6. ICS 152 Computer Systems Architecture. Prof. Juan Luis Aragón

CS 61C: Great Ideas in Computer Architecture Control and Pipelining

CS 351 Exam 2 Mon. 11/2/2015

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

CPU Design Steps. EECC550 - Shaaban

CS Computer Architecture Spring Week 10: Chapter

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition. Chapter 4. The Processor

UC Berkeley CS61C : Machine Structures

How to design a controller to produce signals to control the datapath

CC 311- Computer Architecture. The Processor - Control

ECE260: Fundamentals of Computer Engineering

Processor. Han Wang CS3410, Spring 2012 Computer Science Cornell University. See P&H Chapter , 4.1 4

Transcription:

EECS150 - Digital Design Lecture 9- CPU Microarchitecture Feb 15, 2011 John Wawrzynek Spring 2011 EECS150 - Lec09-cpu Page 1 Watson: Jeopardy-playing Computer Watson is made up of a cluster of ninety IBM Power 750 servers (plus additional I/O, network and cluster controller nodes in 10 racks) for a total of 2880 POWER7 processor cores and 16 Terabytes of RAM. Each Power 750 server uses a 3.5 GHz POWER7 eight core processor, with four threads per core. and it still takes ~15 seconds a question. Each core can do 8 double-precision FLOPS/cycle. So total is 2880*3.5*8 > 80 TFLOPS Spring 2011 EECS150 - Lec09-cpu Page 2

Processor Microarchitecture Introduction Microarchitecture: how to implement an architecture in hardware Good examples of how to put principles of digital design to practice. Introduction to final project. Spring 2011 EECS150 - Lec09-cpu Page 3 MIPS Processor Architecture For now we consider a subset of MIPS instructions: R-type instructions: and, or, add, sub, slt Memory instructions: lw, sw Branch instructions: beq Later we ll add addi and j Spring 2011 EECS150 - Lec09-cpu Page 4

MIPS Micrarchitecture Oganization Datapath + Controller + External Memory Controller Spring 2011 EECS150 - Lec09-cpu Page 5 How to Design a Processor: step-by-step 1. Analyze instruction set architecture (ISA) datapath requirements meaning of each instruction is given by the data transfers (register transfers) datapath must include storage element for ISA registers datapath must support each data transfer 2. Select set of datapath components and establish clocking methodology 3. Assemble datapath meeting requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the data transfer. 5. Assemble the control logic. Spring 2011 EECS150 - Lec09-cpu Page 6

Review: The MIPS Instruction R-type I-type J-type 31 31 31 26 21 16 11 6 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits 26 21 16 op rs rt address/immediate 6 bits 5 bits 5 bits 16 bits 26 op target address 6 bits 26 bits 0 0 0 The different fields are: op: operation ( opcode ) of the instruction rs, rt, rd: the source and destination register specifiers shamt: shift amount funct: selects the variant of the operation in the op field address / immediate: address offset or immediate value target address: target address of jump instruction Spring 2011 EECS150 - Lec09-cpu Page 7 add, sub, or, slt addu rd,rs,rt subu rd,rs,rt lw, sw lw rt,rs,imm16 sw rt,rs,imm16 beq beq rs,rt,imm16 Subset for Lecture 31 31 31 26 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits 26 Spring 2011 EECS150 - Lec09-cpu Page 8 21 21 op rs rt immediate 6 bits 5 bits 5 bits 16 bits 26 21 op rs rt immediate 6 bits 5 bits 5 bits 16 bits 16 16 16 11 6 0 0 0

Register Transfer Descriptions All start with instruction fetch: {op, rs, rt, rd, shamt, funct} IMEM[ PC ] OR {op, rs, rt, Imm16} IMEM[ PC ] THEN inst Register Transfers add R[rd] R[rs] + R[rt]; PC PC + 4 sub R[rd] R[rs] R[rt]; PC PC + 4 or R[rd] R[rs] R[rt]; PC PC + 4 slt R[rd] (R[rs] < R[rt])? 1 : 0; PC PC + 4 lw R[rt] DMEM[ R[rs] + sign_ext(imm16)]; PC PC + 4 sw DMEM[ R[rs] + sign_ext(imm16) ] R[rt]; PC PC + 4 beq if ( R[rs] == R[rt] ) then PC PC + 4 + {sign_ext(imm16), 00} else PC PC + 4 Spring 2011 EECS150 - Lec09-cpu Page 9 Microarchitecture Multiple implementations for a single architecture: Single-cycle Each instruction executes in a single clock cycle. Multicycle Each instruction is broken up into a series of shorter steps with one step per clock cycle. Pipelined (variant on multicycle ) Each instruction is broken up into a series of steps with one step per clock cycle Multiple instructions execute at once. Spring 2011 EECS150 - Lec09-cpu Page 10

CPU clocking (1/2) Single Cycle CPU: All stages of an instruction are completed within one long clock cycle. The clock cycle is made sufficient long to allow each instruction to complete all stages without interruption and within one cycle. 1. Instruction Fetch 2. Decode/ Register Read 3. Execute 4. Memory 5. Reg. Write Spring 2011 EECS150 - Lec09-cpu Page 11 CPU clocking (2/2) Multiple-cycle CPU: Only one stage of instruction per clock cycle. The clock is made as long as the slowest stage. 1. Instruction Fetch 2. Decode/ Register Read 3. Execute 4. Memory 5. Reg. Write Several significant advantages over single cycle execution: Unused stages in a particular instruction can be skipped OR instructions can be pipelined (overlapped). Spring 2011 EECS150 - Lec09-cpu Page 12

MIPS State Elements Determines everything about the execution status of a processor: PC register 32 registers Memory Note: for these state elements, clock is used for write but not for read (asynchronous read, synchronous write). Spring 2011 EECS150 - Lec09-cpu Page 13 Single-Cycle Datapath: lw fetch First consider executing lw STEP 1: Fetch instruction R[rt] DMEM[ R[rs] + sign_ext(imm16)] Spring 2011 EECS150 - Lec09-cpu Page 14

Single-Cycle Datapath: lw register read R[rt] DMEM[ R[rs] + sign_ext(imm16)] STEP 2: Read source operands from register file Spring 2011 EECS150 - Lec09-cpu Page 15 Single-Cycle Datapath: lw immediate R[rt] DMEM[ R[rs] + sign_ext(imm16)] STEP 3: Sign-extend the immediate Spring 2011 EECS150 - Lec09-cpu Page 16

Single-Cycle Datapath: lw address R[rt] DMEM[ R[rs] + sign_ext(imm16)] STEP 4: Compute the memory address Spring 2011 EECS150 - Lec09-cpu Page 17 Single-Cycle Datapath: lw memory read R[rt] DMEM[ R[rs] + sign_ext(imm16)] STEP 5: Read data from memory and write it back to register file Spring 2011 EECS150 - Lec09-cpu Page 18

Single-Cycle Datapath: lw PC increment STEP 6: Determine the address of the next instruction PC PC + 4 Spring 2011 EECS150 - Lec09-cpu Page 19 Single-Cycle Datapath: sw DMEM[ R[rs] + sign_ext(imm16) ] R[rt] Write data in rt to memory Spring 2011 EECS150 - Lec09-cpu Page 20

Single-Cycle Datapath: R-type instructions Read from rs and rt Write ALUResult to register file Write to rd (instead of rt) R[rd] R[rs] op R[rt] Spring 2011 EECS150 - Lec09-cpu Page 21 Single-Cycle Datapath: beq if ( R[rs] == R[rt] ) then PC PC + 4 + {sign_ext(imm16), 00} Determine whether values in rs and rt are equal Calculate branch target address: BTA = (sign-extended immediate << 2) + (PC+4) Spring 2011 EECS150 - Lec09-cpu Page 22

Complete Single-Cycle Processor Spring 2011 EECS150 - Lec09-cpu Page 23 Review: ALU F 2:0 Function 000 A & B 001 A B 010 A + B 011 not used 100 A & ~B 101 A ~B 110 A - B 111 SLT Spring 2011 EECS150 - Lec09-cpu Page 24

Control Unit Spring 2011 EECS150 - Lec09-cpu Page 25 Control Unit: ALU Decoder ALUOp 1:0 Meaning 00 Add 01 Subtract 10 Look at Funct 11 Not Used ALUOp 1:0 Funct ALUControl 2:0 00 XXXXXX 010 (Add) X1 XXXXXX 110 (Subtract) 1X 100000 (add) 010 (Add) 1X 100010 (sub) 110 (Subtract) 1X 100100 (and) 000 (And) 1X 100101 (or) 001 (Or) Spring 2011 EECS150 - Lec09-cpu Page 26 1X 101010 (slt) 111 (SLT)

Control Unit: Main Decoder Instruction Op 5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp 1:0 R-type 000000 lw 100011 sw 101011 beq 000100 Spring 2011 EECS150 - Lec09-cpu Page 27 Control Unit: Main Decoder Instruction Op 5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp 1:0 R-type 000000 1 1 0 0 0 0 10 lw 100011 1 0 1 0 0 0 00 sw 101011 0 X 1 0 1 X 00 beq 000100 0 X 0 1 0 X 01 Spring 2011 EECS150 - Lec09-cpu Page 28

Single-Cycle Datapath Example: or Spring 2011 EECS150 - Lec09-cpu Page 29 Extended Functionality: addi No change to datapath Spring 2011 EECS150 - Lec09-cpu Page 30

Control Unit: addi Instruction Op 5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp 1:0 R-type 000000 1 1 0 0 0 0 10 lw 100011 1 0 1 0 0 1 00 sw 101011 0 X 1 0 1 X 00 beq 000100 0 X 0 1 0 X 01 addi 001000 Spring 2011 EECS150 - Lec09-cpu Page 31 Control Unit: addi Instruction Op 5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp 1:0 R-type 000000 1 1 0 0 0 0 10 lw 100011 1 0 1 0 0 1 00 sw 101011 0 X 1 0 1 X 00 beq 000100 0 X 0 1 0 X 01 addi 001000 1 0 1 0 0 0 00 Spring 2011 EECS150 - Lec09-cpu Page 32

Extended Functionality: j Spring 2011 EECS150 - Lec09-cpu Page 33 Control Unit: Main Decoder Instruction Op 5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp 1:0 Jump R-type 000000 1 1 0 0 0 0 10 0 lw 100011 1 0 1 0 0 1 00 0 sw 101011 0 X 1 0 1 X 00 0 beq 000100 0 X 0 1 0 X 01 0 j 000100 Spring 2011 EECS150 - Lec09-cpu Page 34

Control Unit: Main Decoder Instructi on Op 5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp 1:0 Jump R-type 0 1 1 0 0 0 0 10 0 lw 100011 1 0 1 0 0 1 0 0 sw 101011 0 X 1 0 1 X 0 0 beq 100 0 X 0 1 0 X 1 0 j 100 0 X X X 0 X XX 1 Spring 2011 EECS150 - Lec09-cpu Page 35 Review: Processor Performance Program Execution Time = (# instructions)(cycles/instruction)(seconds/cycle) = # instructions x CPI x T C Spring 2011 EECS150 - Lec09-cpu Page 36

Single-Cycle Performance T C is limited by the critical path (lw) Spring 2011 EECS150 - Lec09-cpu Page 37 Single-Cycle Performance Single-cycle critical path: T c = t pcq_pc + t mem + max(t RFread, t sext + t mux ) + t ALU + t mem + t mux + t RFsetup In most implementations, limiting paths are: memory, ALU, register file. T c = t pcq_pc + 2t mem + t RFread + t mux + t ALU + t RFsetup Spring 2011 EECS150 - Lec09-cpu Page 38

Single-Cycle Performance Example Element Parameter Delay (ps) Register clock-to-q t pcq_pc 30 Register setup t setup 20 Multiplexer t mux 25 ALU t ALU 200 Memory read t mem 250 Register file read t RFread 150 Register file setup t RFsetup 20 T c = Spring 2011 EECS150 - Lec09-cpu Page 39 Single-Cycle Performance Example Element Parameter Delay (ps) Register clock-to-q t pcq_pc 30 Register setup t setup 20 Multiplexer t mux 25 ALU t ALU 200 Memory read t mem 250 Register file read t RFread 150 Register file setup t RFsetup 20 T c = t pcq_pc + 2t mem + t RFread + t mux + t ALU + t RFsetup = [30 + 2(250) + 150 + 25 + 200 + 20] ps = 925 ps Spring 2011 EECS150 - Lec09-cpu Page 40

Single-Cycle Performance Example For a program with 100 billion instructions executing on a singlecycle MIPS processor, Execution Time = Spring 2011 EECS150 - Lec09-cpu Page 41 Single-Cycle Performance Example For a program with 100 billion instructions executing on a singlecycle MIPS processor, Execution Time = # instructions x CPI x T C = (100 10 9 )(1)(925 10-12 s) = 92.5 seconds Spring 2011 EECS150 - Lec09-cpu Page 42

Pipelined MIPS Processor Temporal parallelism Divide single-cycle processor into 5 stages: Fetch Decode Execute Memory Writeback Add pipeline registers between stages Spring 2011 EECS150 - Lec09-cpu Page 43 Single-Cycle vs. Pipelined Performance Spring 2011 EECS150 - Lec09-cpu Page 44

Pipelining Abstraction Spring 2011 EECS150 - Lec09-cpu Page 45 Single-Cycle and Pipelined Datapath Spring 2011 EECS150 - Lec09-cpu Page 46

Corrected Pipelined Datapath WriteReg must arrive at the same time as Result Spring 2011 EECS150 - Lec09-cpu Page 47 Pipelined Control Same control unit as single-cycle processor Spring 2011 EECS150 - Lec09-cpu Page 48 Control delayed to proper pipeline stage

Pipeline Hazards Occurs when an instruction depends on results from previous instruction that hasn t completed. Types of hazards: Data hazard: register value not written back to register file yet Control hazard: next instruction not decided yet (caused by branches) Spring 2011 EECS150 - Lec09-cpu Page 49