ECE 4750 Computer Architecture Topic 2: From CISC to RISC

Similar documents
ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 4 Reduced Instruction Set Computers

CS 152 Computer Architecture and Engineering. Lecture 3 - From CISC to RISC

Simple Instruction Pipelining

CS 152 Computer Architecture and Engineering. Lecture 3 - From CISC to RISC

C 1. Last Time. CSE 490/590 Computer Architecture. ISAs and MIPS. Instruction Set Architecture (ISA) ISA to Microarchitecture Mapping

Lecture 3 - From CISC to RISC

Lecture 4 - Pipelining

Lecture 3 - From CISC to RISC

CS 152 Computer Architecture and Engineering. Lecture 3 - From CISC to RISC

Lecture 6 Datapath and Controller

CSCI-564 Advanced Computer Architecture

Lecture 7 Pipelining. Peng Liu.

EC 513 Computer Architecture

CS 152, Spring 2011 Section 2

CS 152 Computer Architecture and Engineering. Lecture 3 - From CISC to RISC. Last Time in Lecture 2

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 2 CISC and Microcoding

Lecture 08: RISC-V Single-Cycle Implementa9on CSE 564 Computer Architecture Summer 2017

ECE 4750 Computer Architecture, Fall 2014 T01 Single-Cycle Processors

CS 152 Computer Architecture and Engineering. Lecture 13 - VLIW Machines and Statically Scheduled ILP

Computer Architecture ELEC3441

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 6 Pipelining Part 1

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 3 Early Microarchitectures

Microprogramming. The DLX ISA

CS 152, Spring 2011 Section 8

Lecture 3 - Pipelining

Computer Architecture 计算机体系结构. Lecture 2. Instruction Set Architecture 第二讲 指令集架构. Chao Li, PhD. 李超博士

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

Non-Pipelined Processors

ENGN1640: Design of Computing Systems Topic 04: Single-Cycle Processor Design

Non-Pipelined Processors

Chapter 4. The Processor. Computer Architecture and IC Design Lab

CS 152 Computer Architecture and Engineering. Lecture 16 - VLIW Machines and Statically Scheduled ILP

Improving Performance: Pipelining

C 1. Last time. CSE 490/590 Computer Architecture. Complex Pipelining I. Complex Pipelining: Motivation. Floating-Point Unit (FPU) Floating-Point ISA

CS 152 Computer Architecture and Engineering. Lecture 2 - Simple Machine Implementations

Lecture 12 Branch Prediction and Advanced Out-of-Order Superscalars

EC 513 Computer Architecture

CS 152 Computer Architecture and Engineering. Lecture 16: Vector Computers

Computer Performance. Relative Performance. Ways to measure Performance. Computer Architecture ELEC /1/17. Dr. Hayden Kwok-Hay So

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III

CS252 Spring 2017 Graduate Computer Architecture. Lecture 8: Advanced Out-of-Order Superscalar Designs Part II

Systems Architecture

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 2 - Simple Machine Implementations

CS 152 Computer Architecture and Engineering. Lecture 12 - Advanced Out-of-Order Superscalars

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

Lecture 13 - VLIW Machines and Statically Scheduled ILP

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan)

CS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

Anne Bracy CS 3410 Computer Science Cornell University. See P&H Chapter: , , Appendix B

From CISC to RISC. CISC Creates the Anti CISC Revolution. RISC "Philosophy" CISC Limitations

Chapter 4. The Processor

The Big Picture: Where are We Now? EEM 486: Computer Architecture. Lecture 3. Designing a Single Cycle Datapath

CS 152 Computer Architecture and Engineering. Lecture 5 - Pipelining II (Branches, Exceptions)

Lecture 4 Pipelining Part II

Processor (I) - datapath & control. Hwansoo Han

ENE 334 Microprocessors

ECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 8 Instruction-Level Parallelism Part 1

CS 152 Computer Architecture and Engineering. Lecture 8 - Memory Hierarchy-III

ECS 154B Computer Architecture II Spring 2009

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Instruction Set Principles. (Appendix B)

Full Name: NetID: Midterm Summer 2017

Lecture 7 - Memory Hierarchy-II

Computer Systems Architecture Spring 2016

CPE 335. Basic MIPS Architecture Part II

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

CSC 631: High-Performance Computer Architecture

Computer Architecture

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Processor. Han Wang CS3410, Spring 2012 Computer Science Cornell University. See P&H Chapter , 4.1 4

Computer Science 141 Computing Hardware

EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts

ECE232: Hardware Organization and Design

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

CS 152, Spring 2012 Section 8

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 15 Very Long Instruction Word Machines

Topic #6. Processor Design

Systems Architecture I

MIPS An ISA for Pipelining

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

COSC 6385 Computer Architecture - Pipelining

Processor Architecture

CS146 Computer Architecture. Fall Midterm Exam

Midterm I March 12, 2003 CS152 Computer Architecture and Engineering

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

EECE 417 Computer Systems Architecture

CS252 Spring 2017 Graduate Computer Architecture. Lecture 3.5: From CISC to RISC II

ECE 252 / CPS 220 Advanced Computer Architecture I. Lecture 14 Very Long Instruction Word Machines

VLIW/EPIC: Statically Scheduled ILP

CPE 335 Computer Organization. Basic MIPS Architecture Part I

Transcription:

ECE 4750 Computer Architecture Topic 2: From CISC to RISC Christopher Batten School of Electrical and Computer Engineering Cornell University http://www.csl.cornell.edu/courses/ece4750 slide revision: 2013-09-08-23-34

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor CPI for Microcoded Machine Inst 1 7 cycles Inst 2 5 cycles Inst 3 10 cycles Total clock cycles = 7 + 5 + 10 = 22 Total instructions = 3 Clocks per Instruction (CPI) = 22 / 3 = 7.33 CPI is always an average over a large number of instructions ECE 4750 T02: From CISC to RISC 2 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Iron Law of Processor Performance Time Program = Instructions Program Cycles Instruction Time Cycles Instructions / program depends on source code, compiler, ISA Cycles / instruction (CPI) depends on ISA, microarchitecture Time / cycle depends upon microarchitecture and implementation Microarchitecture CPI Cycle Time last topic Microcoded >1 short this topic Single-Cycle Unpipelined 1 long this topic Multi-Cycle Unpipelined >1 short next topic Pipelined 1 short ECE 4750 T02: From CISC to RISC 3 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Agenda Technology Trends Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor ECE 4750 T02: From CISC to RISC 4 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Extremely popular VAX 11/780 first available in 1977; often used as a baseline for benchmarking and assumed to have a speed of 1M instructions/section (1 MIPS): 5 MHz, TTL devices Minicomputers in the 1970 s Implemented with racks of discrete components Used microcode to implement CISC ISA Applications in business, scientific, commercial computing ECE 4750 T02: From CISC to RISC 5 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Microprocessors in the 1970 s Microprocessors made possible by new integrated circuit tech Constrained by what could fit on a single chip leading to few-bit datapaths with hardwired control Initial application was for embedded control First microprocessor is the Intel 4004 fabricated in 1971: designed for desktop printing calculator: 750 KHz, 8 16 cycles/inst, 8 µm PMOS, 2.3K transistors, 12 mm 2, microcoded control to implement CISC ISA 8-bit microprocessors used in hobbyist personal computers Micral, Alrair, TRS-80, Apple-II Usually had 16-bit address space (65KB directly addressable) Simple BASIC interpreter in ROM or cassette tape ECE 4750 T02: From CISC to RISC 6 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor DRAM in the 1970 s Dramatic progress in MOSFET memory technology 1970 Intel introduces first DRAM (Model 1103 w/ 1 Kb) 1979 Fujitsu introduces 64 Kb DRAM By mid-1970 s became obvious that microprocessors would soon have >64 KB of physical memory ECE 4750 T02: From CISC to RISC 7 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor VisiCalc as Killer App and Eventually the IBM PC ECE 4750 T02: From CISC to RISC 8 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Analyzing Microcoded Machines John Cocke and group at IBM Working on a simple pipelined processor, 801, and advanced compilers Ported experimental PL8 compiler to IBM 370, and only used simple register-register and load/store instructions similar to 801 Code ran faster than other existing compilers that used all 370 instructions! (up to 6 MIPS, whereas 2 MIPS considered good before) Joel Emer and Douglas Clark at DEC Measured VAX-11/780 using external hardware Found it was actually a 0.5 MIPS machine, not a 1 MIPS machine 20% of VAX instrs = 60% of µcode, but only 0.2% of the dynamic execution VAX 8800, high-end VAX in 1984 Control store: 16K 147b RAM, Unified Cache: 64K 8b RAM 4.5 more microstore RAM than cache RAM! ECE 4750 T02: From CISC to RISC 9 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Key changes in tech constraints From CISC to RISC Logic, RAM, ROM all implemented with MOS transistors RAM same speed as ROM Use fast RAM to build fast instruction cache of user-visible instructions, not fixed hardware microfragments Change contents of fast instruction memory to fit what app needs Use simple ISA to enable hardwired pipelined implementation Most compiled code only used a few of CISC instructions Simpler encoding allod pipelined implementations Load/Store Reg-Reg ISA as opposed to Mem-Mem ISA Further benefit with integration Early 1980 s fit 32-bit datapath, small caches on single chip No chip crossing in common case allows faster operation ECE 4750 T02: From CISC to RISC 10 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor From CISC to RISC Vertical μcode Controller RISC Controller μpc User PC ROM for μinst RAM for Instr Cache Small Decoder "Larger" Decoder ECE 4750 T02: From CISC to RISC 11 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Berkeley RISC Chips RISC-I fabricated in 1982 under the direction of David Patterson and probably the first VLSI RISC processor: 1 MHz, 5 µm NMOS, 44.5K transistors, 77 mm 2 RISC-II was the 1983 follow up with several improvements: 3 MHz, 3 µm NMOS, 40.7K transistors, 60 mm 2 ECE 4750 T02: From CISC to RISC 12 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Stanford MIPS Chips First MIPS prototype fabricated in 1984 under direction of John Hennessy; MIPS-X was the 1986 follow up: 5-stage, 20 MHz, 2 µm 2-layer CMOS John Hennessy leaves Stanford to form MIPS Computer Systems and their first chip is MIPS R2000 in 1986: 8 15 MHz, 2 µm 2-layer CMOS, 110K transistors, 80 mm 2 ECE 4750 T02: From CISC to RISC 13 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor MIPS vs. VAX 4.0 Ratio of MIPS to VAX 3.5 3.0 2.5 2.0 Performance Ratio Instructions Excuted Ratio 1.5 1.0 0.5 2x more instr 6x lor CPI 2-4x higher perf CPI Ratio 0.0 spice matrix nasa7 fpppp tomcatv doduc espresso eqntott li -- H&P, Appendix J, from Bhandarkar and Clark, 1991 ECE 4750 T02: From CISC to RISC 14 / 43

to deliver high performance throughout that period. The new processor uses deep queues decouple the instruction fetch logic from the execution units. Instruc- Speculative Execution Beyond Branches Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor tions that are ready to execute can jump ahead of those waiting for operands, increasing the utilization of the execution units. This technique, known as out-of-order execution, has been used in PorPC processors for some time (see 081402.PDF), but the new MIPS design is the most aggressive implementation yet, allowing more instructions to be queued than any of its competitors. ITLB 8 entry PC Unit Memory Queue 16 entries Address Adder virtual addr Main TLB 64 entries BHT 512 x 2 1 Instruction Cache 32K, two-way associative 4 instr Integer Registers 64! 64 bits phys addr Decode, Map, Dispatch 2! 4 instr Integer Queue 16 entries Resume Cache 64 FP " 4 instr Figure 1. The R10000 uses deep instruction queues to decouple the instruction fetch logic from the five function units. the performance of its processor. The front end of the processor is responsible for maintaining a continuous flow of instructions into the queues, despite problems caused by branches and cache misses. As Figure 1 shows, the chip uses a two-way set- CISC/RISC associative instruction Convergence cache of 32K. Like other highly Active List FP Mult Predecode Unit FP Registers 64! 64 bits Data Cache 32K, two-way associative Map Table FP Queue 16 entries FP Adder MIPS R10K uses sophisticated out-of-order engine; branch delay slot not useful 128 Data SRAM superscalar designs, the R10000 predecodes instructions as they are loaded into this cache, which holds four extra bits per instruction. These bits reduce the time needed to determine the appropriate queue for each instruction. The processor fetches four instructions per cycle from the cache and decodes them. If a branch is discovered, it is immediately predicted; if it is predicted taken, the target address is sent to the instruction cache, redirecting the fetch stream. Because of the one cycle needed to decode the branch, taken branches create a bubble in the fetch stream; the deep queues, hover, generally prevent this bubble from delaying the execution pipeline. The sequential instructions that are loaded during this extra cycle are not discarded but are saved in a resume cache. If the branch is later determined to have been mispredicted, the sequential instructions are reloaded from the resume cache, reducing the mispredicted branch penalty by one cycle. The resume cache has four entries of four instructions each, allowing speculative execution beyond four branches. The R10000 design uses the standard two-bit Smith method to predict MIPS R10000 Uses Decoupled Architecture Vol. 8, No. 14, October 24, 1994 1994 MicroDesign Resources L2 Cache Interface System Interface Gnnap, MPR, 1994 128 Tag SRAM 512K -16M Avalanche Bus (64 bit addr/data) Intel Nehalem frontend breaks x86 CISC into smaller RISC-like µops; µcode engine handles rarely used complex instr Kanter, Real World Technologies, 2009 ECE 4750 T02: From CISC to RISC 15 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Agenda Technology Trends Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor ECE 4750 T02: From CISC to RISC 16 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Register File with Combinational Read En Clk Single Register D 0 D 1 D 2 ff ff ff...... D n-1 ff Q 0 Q 1 Q 2... Q n-1 Clock WE ReadSel1 ReadSel2 WriteSel WriteData rs1 rs2 ws wd Register file 2R+1W rd1 rd2 ReadData1 ReadData2 ECE 4750 T02: From CISC to RISC 17 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Register File Implementation ws wd rd1 rd2 5 32 32 32 rs1 rs2 5 5 reg 0 reg 1 reg 31 Register files with large number of ports are difficult to implement Almost all MIPS instrs have exactly two register source operands Intel s Itanium general-purpose register file has 128 registers with 8 read ports and 4 write ports! ECE 4750 T02: From CISC to RISC 18 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Magic Memory Model WriteEnable Clock Address WriteData MAGIC RAM ReadData Read is combinational Write is performed at the rising clock edge if enabled Write address must be stable at the clock edge Later will consider using more realistic memory ECE 4750 T02: From CISC to RISC 19 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor More Realistic Memory Model Address WriteData WriteEnable Clock SRAM ReadData Synchronous operation Read data ready next cycle Read/write data buses share single internal bit lines Simplified SRAM Read Simplified SRAM Write ECE 4750 T02: From CISC to RISC 20 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Agenda Technology Trends Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor ECE 4750 T02: From CISC to RISC 21 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor MIPS Instruction Formats 6 5 5 5 5 6 0 rs rt rd 0 func R[rd] R[rs] func R[rt] 31 26 25 21 20 16 15 11 10 6 5 0 6 5 5 16 I opcode rs rt immediate R[rt] R[rs] op immediate 31 26 25 21 20 16 15 0 6 5 5 16 LD/ST opcode rs rt offset 31 26 25 21 20 16 15 0 6 5 5 16 BEQZ opcode rs 0 offset 31 26 25 21 20 16 15 0 6 5 5 16 JR/JALR opcode rs 0 0 31 26 25 21 20 16 15 0 6 26 J/JAL opcode target 31 26 25 0 ST: M[ R[rs] + sext(offset) ] R[rt] LD: R[rt] M[ R[rs] + sext(offset) ] if ( R[rs] == 0 ) PC PC+4 + offset*4 PC R[rs] JALR also does R[31] PC+8 PC jtarg( PC, target ) JAL also does R[31] PC+8 ECE 4750 T02: From CISC to RISC 22 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Instruction Execution Steps 1. Instruction fetch 2. Decode and register fetch 3. operation 4. Memory operation if required 5. Register write-back if required Computation of the next instruction to fetch ECE 4750 T02: From CISC to RISC 23 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Datapath: Reg-Reg Instructions (ADDU) 0x4 Add RegWrite PC addr inst Inst. Memory inst<25:21> inst<20:16> inst<15:11> rs1 rs2 rd1 ws wd rd2 GPRs z inst<5:0> Control OpCode ECE 4750 T02: From CISC to RISC 24 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Datapath: I Reg-Imm Instructions (ADDIU) 0x4 Add RegWrite PC addr inst Inst. Memory inst<25:21> inst<20:16> inst<15:0> inst<31:26> rs1 rs2 rd1 ws wd rd2 GPRs Imm Ext Control z OpCode ExtSel ECE 4750 T02: From CISC to RISC 25 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Address Conflicts in Merged Datapath with Muxes 0x4 Add RegWrite PC addr inst Inst. Memory inst<25:21> inst<20:16> inst<20:16> inst<15:11> inst<15:0> inst<31:26> inst<5:0> rs1 rs2 rd1 ws wd rd2 GPRs Imm Ext Control z OpCode ExtSel ECE 4750 T02: From CISC to RISC 26 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Datapath: and I Instructions 0x4 Add RegWrite PC addr inst Inst. Memory <25:21> <20:16> <15:11> <15:0> rs1 rs2 rd1 ws wd rd2 GPRs Imm Ext z <31:26>, <5:0> Control OpCode RegDst rt / rd ExtSel OpSel BSrc Reg / Imm ECE 4750 T02: From CISC to RISC 27 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Approach for Program and Data Memory Harvard-style : separate program and data memories Inspired by Howard Aiken and the Mark I Read-only program memory Read/write data memory Need some way to load program memory Princeton-style : unified program and data memories Inspired by von Neumann Single read/write memory for both Load/store instructions require accessing memory twice during execution Most modern machines are mixed with separate instruction and data caches but a unified main memory that holds both the program and data ECE 4750 T02: From CISC to RISC 28 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Datapath: Load Instructions (LW) PC 0x4 Add addr inst Inst. Memory rs offset RegWrite rs1 rs2 rd1 ws wd rd2 GPRs Imm Ext Control z MemWrite addr rdata Data Memory wdata WBSrc / Mem OpCode RegDst ExtSel OpSel BSrc ECE 4750 T02: From CISC to RISC 29 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Datapath: Store Instructions (SW) PC 0x4 Add addr inst Inst. Memory rs offset RegWrite rs1 rs2 rd1 ws wd rd2 GPRs Imm Ext Control z MemWrite addr rdata Data Memory wdata WBSrc / Mem OpCode RegDst ExtSel OpSel BSrc ECE 4750 T02: From CISC to RISC 30 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Datapath: Conditional Branches (BEQZ) PCSrc br RegWrite MemWrite WBSrc pc+4 0x4 Add Add PC addr inst Inst. Memory rs1 rs2 rd1 ws wd rd2 GPRs Imm Ext Control z addr rdata Data Memory wdata OpCode RegDst ExtSel OpSel BSrc zero? ECE 4750 T02: From CISC to RISC 31 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Datapath: Register-Indirect Jumps (JR) PCSrc br rind RegWrite MemWrite WBSrc pc+4 0x4 Add Add PC addr inst Inst. Memory rs1 rs2 rd1 ws wd rd2 GPRs Imm Ext Control z addr rdata Data Memory wdata OpCode RegDst ExtSel OpSel BSrc zero? ECE 4750 T02: From CISC to RISC 32 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Datapath: Register-Indirect Jump-&-Link (JALR) PCSrc br rind RegWrite MemWrite WBSrc pc+4 0x4 Add Add PC addr inst Inst. Memory 31 rs1 rs2 rd1 ws wd rd2 GPRs Imm Ext Control z addr rdata Data Memory wdata OpCode RegDst ExtSel OpSel BSrc zero? ECE 4750 T02: From CISC to RISC 33 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Datapath: Absolute Jump-&-Link (J,JAL) PCSrc br rind jabs pc+4 RegWrite MemWrite WBSrc 0x4 Add Add PC addr inst Inst. Memory 31 rs1 rs2 rd1 ws wd rd2 GPRs Imm Ext Control z addr rdata Data Memory wdata OpCode RegDst ExtSel OpSel BSrc zero? ECE 4750 T02: From CISC to RISC 34 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Final Harvard Style Datapath for MIPS PCSrc br rind jabs pc+4 RegWrite MemWrite WBSrc 0x4 Add Add PC addr inst Inst. Memory 31 rs1 rs2 rd1 ws wd rd2 GPRs Imm Ext Control z addr rdata Data Memory wdata OpCode RegDst ExtSel OpSel BSrc zero? ECE 4750 T02: From CISC to RISC 35 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Hardwired Controller is Pure Combinational Logic op code zero? Comb Logic ExtSel BSrc OpSel MemWrite WBSrc RegDst Inst<5:0> (Func) Inst<31:26> (Opcode) + 0? op RegWrite PCSrc OpSel ( Func,Op,+,0? ) Decode Map ExtSel ( sext 16, uext 16, High 16 ) ECE 4750 T02: From CISC to RISC 36 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Hardwired Control Table Hardwired Control Table Opcode ExtSel BSrc OpSel MemW RegW WBSrc RegDst PCSrc i iu LW SW BEQZ z=0 BEQZ z=1 J JAL JR JALR * Reg Func no yes rd sext 16 Imm Op no yes rt pc+4 uext 16 Imm Op no yes rt pc+4 sext 16 Imm + no yes Mem rt pc+4 sext 16 Imm + yes no * * pc+4 sext 16 * 0? no no * * sext 16 * 0? no no * * * * * no no * * pc+4 pc+4 jabs * * * no yes PC R31 jabs * * * no no * * rind * * * no yes PC R31 rind br BSrc = { Reg, Imm } RegDest = { rt, rd, R31 } WBSrc = {, Mem, PC } PCSrc = { pc+4, br, rind, jabs } BSrc = Reg / Imm WBSrc = / Mem / PC RegDst = rt / rd / R31 PCSrc = pc+4 / br / rind / jabs January 26, 2010 CS152, Spring 2010 42 ECE 4750 T02: From CISC to RISC 37 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Single-Cycle Hardwired Control Requires that clock period is sufficiently long so that all of the following steps can be completed 1. Instruction fetch 2. Decode and register fetch 3. operation 4. Data read or data store if required 5. Register write-back setup time if required t c > t ifetch + t rfrd + t + t dmem + t rfwr At the rising edge of the clock: the PC, the register file, and the memory are updated ECE 4750 T02: From CISC to RISC 38 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Agenda Technology Trends Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor ECE 4750 T02: From CISC to RISC 39 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined Datapath 0x4 PC Add addr rdata Inst. Memory IR rs1 rs2 rd1 ws wd rd2 GPRs Imm Ext addr rdata Data Memory wdata fetch phase decode & Reg-fetch phase execute phase memory phase write -back phase Clock period is reduced by dividing the execution of an instruction into multiple cycles; allows for more realistic synchronous memory t c < max(t ifetch, t rf, t, t dmem, t rfwr ) CPI will of course be greater than one ECE 4750 T02: From CISC to RISC 40 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined Controller Figure 2: Appendix: Multicycle PARCv1 State Diagram 20 ECE 4750 T02: From CISC to RISC 41 / 43 ECE 4750 Computer Architecture, Fall 2011 Lab 2: Multicycle PARCv2 Processor

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Summary Microcoding less attractive due to evolving technology constraints Unpipelined µarch first step towards RISC design philosophy Iron Law of processor performance helps explain design space Single-Cycle Unpipelined Microcoded CPI = 7.33 Inst 1 1 cycle Inst 2 1 cycle Inst 1 7 cycles Inst 3 1 cycle Inst 2 5 cycles Multi-Cycle Unpipelined CPI = 1 CPI = 4.33 Inst 3 10 cycles Inst 1 5 cycles Inst 2 3 cycles Inst 3 5 cycles Microarchitecture CPI Cycle Time last topic Microcoded >1 short this topic Single-Cycle Unpipelined 1 long this topic Multi-Cycle Unpipelined >1 short next topic Pipelined 1 short ECE 4750 T02: From CISC to RISC 42 / 43

Motivating RISC Memory Basics Single-Cycle Unpipelined MIPS Processor Multi-Cycle Unpipelined MIPS Processor Acknowledgements Some of these slides contain material developed and copyrighted by: Arvind (MIT), Krste Asanović (MIT/UCB), Joel Emer (Intel/MIT) James Hoe (CMU), John Kubiatowicz (UCB), David Patterson (UCB) MIT material derived from course 6.823 UCB material derived from courses CS152 and CS252 ECE 4750 T02: From CISC to RISC 43 / 43