Lecture 08: RISC-V Single-Cycle Implementa9on CSE 564 Computer Architecture Summer 2017

Similar documents
Chapter 4. The Processor

Lecture 3 - From CISC to RISC

Processor (I) - datapath & control. Hwansoo Han

Chapter 4. The Processor

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Chapter 4. The Processor Designing the datapath

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

Systems Architecture

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

Lecture 3 - From CISC to RISC

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Lecture 4 - Pipelining

EC 513 Computer Architecture

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

TDT4255 Computer Design. Lecture 4. Magnus Jahre. TDT4255 Computer Design

Chapter 4. The Processor

ECE 552 / CPS 550 Advanced Computer Architecture I. Lecture 4 Reduced Instruction Set Computers

ECE260: Fundamentals of Computer Engineering

ECE232: Hardware Organization and Design

Chapter 4. The Processor

Introduction. Datapath Basics

Chapter 4 The Processor 1. Chapter 4A. The Processor

CS 152 Computer Architecture and Engineering. Lecture 3 - From CISC to RISC. Last Time in Lecture 2

The MIPS Processor Datapath

COMPUTER ORGANIZATION AND DESIGN

C 1. Last Time. CSE 490/590 Computer Architecture. ISAs and MIPS. Instruction Set Architecture (ISA) ISA to Microarchitecture Mapping

Introduction. Chapter 4. Instruction Execution. CPU Overview. University of the District of Columbia 30 September, Chapter 4 The Processor 1

CS 152 Computer Architecture and Engineering. Lecture 3 - From CISC to RISC

CS 61C: Great Ideas in Computer Architecture Datapath. Instructors: John Wawrzynek & Vladimir Stojanovic

Single Cycle Datapath

Single Cycle Datapath

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Simple Instruction Pipelining

Inf2C - Computer Systems Lecture Processor Design Single Cycle

Lecture 7 Pipelining. Peng Liu.

CPU Organization (Design)

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

The Big Picture: Where are We Now? EEM 486: Computer Architecture. Lecture 3. Designing a Single Cycle Datapath

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

Full Name: NetID: Midterm Summer 2017

Lecture 6 Datapath and Controller

Topic #6. Processor Design

CS3350B Computer Architecture Winter 2015

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

CS 152 Computer Architecture and Engineering. Lecture 3 - From CISC to RISC

Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan)

COMPUTER ORGANIZATION AND DESIGN

Review: Abstract Implementation View

COMPUTER ORGANIZATION AND DESIGN

Computer Science 61C Spring Friedland and Weaver. The MIPS Datapath

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

LECTURE 5. Single-Cycle Datapath and Control

CS 61C: Great Ideas in Computer Architecture. MIPS CPU Datapath, Control Introduction

Major CPU Design Steps

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

Lecture 12: Single-Cycle Control Unit. Spring 2018 Jason Tang

CENG 3420 Lecture 06: Datapath

RISC Processor Design

CSSE232 Computer Architecture I. Datapath

CPE 335 Computer Organization. Basic MIPS Architecture Part I

Chapter 5: The Processor: Datapath and Control

Lecture 09: RISC-V Pipeline Implementa8on CSE 564 Computer Architecture Summer 2017

ENGN1640: Design of Computing Systems Topic 04: Single-Cycle Processor Design

361 datapath.1. Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

Outline. EEL-4713 Computer Architecture Designing a Single Cycle Datapath

Levels in Processor Design

ECE 486/586. Computer Architecture. Lecture # 7

The Processor: Datapath & Control

CS61C : Machine Structures

CS Computer Architecture Spring Week 10: Chapter

CS 61C: Great Ideas in Computer Architecture. Lecture 13: Pipelining. Krste Asanović & Randy Katz

RISC Architecture: Multi-Cycle Implementation

ECE369. Chapter 5 ECE369

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

ELEC3441: Computer Architecture Second Semester, Feb 23, Quiz 2


RISC Architecture: Multi-Cycle Implementation

UC Berkeley CS61C : Machine Structures

CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath

CSEN 601: Computer System Architecture Summer 2014

ECE170 Computer Architecture. Single Cycle Control. Review: 3b: Add & Subtract. Review: 3e: Store Operations. Review: 3d: Load Operations

Lecture 10: Simple Data Path

CS61C : Machine Structures

CS 152 Computer Architecture and Engineering. Lecture 3 - From CISC to RISC

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

Review. N-bit adder-subtractor done using N 1- bit adders with XOR gates on input. Lecture #19 Designing a Single-Cycle CPU

CS250 Section 4. 9/21/10 Yunsup Lee. Image Courtesy: Tilera

CC 311- Computer Architecture. The Processor - Control

Ch 5: Designing a Single Cycle Datapath

CS222: Processor Design

CS3350B Computer Architecture Quiz 3 March 15, 2018

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

CSE140: Components and Design Techniques for Digital Systems

Transcription:

Lecture 08: RISC-V Single-Cycle Implementa9on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu www.secs.oakland.edu/~yan 1

Acknowledgements The notes cover Appendix C of the textbook, but we use RISC-V instead of MIPS ISA Slides for general RISC ISA implementalon are adapted from Lecture slides for Computer OrganizaLon and Design, FiRh EdiLon: The Hardware/SoRware Interface textbook for general RISC ISA implementalon Slides for RISC-V single-cycle implementalon are adapted from Computer Science 152: Computer Architecture and Engineering, Spring 2016 by Dr. George Michelogiannakis from UC Berkeley 2

Introduc9on CPU performance factors CPU Time = Instructions InstrucLon count Program * Cycles Instruction *Time Cycle Determined by ISA and compiler CPI and Cycle Lme Determined by CPU hardware Simple subset, shows most aspects Memory reference: lw, sw ArithmeLc/logical: add, sub, and, or, slt Control transfer: beq, j

Components of a Computer Processor Control Enable? Read/Write Memory Input Datapath PC Address Program Bytes Registers Write Data ArithmeLc & Logic Unit (ALU) Read Data Data Output Processor-Memory Interface I/O-Memory Interfaces 4

The CPU Processor (CPU): the aclve part of the computer that does all the work (data manipulalon and decisionmaking) Datapath: porlon of the processor that contains hardware necessary to perform operalons required by the processor (the brawn) Control : porlon of the processor (also in hardware) that tells the datapath what needs to be done (the brain) 5

Datapath and Control Datapath designed to support data transfers required by instruclons Controller causes correct transfers to happen PC instruction memory rd rs rt registers ALU Data memory +4 imm opcode, funct Controller 6

Logic Design Basics Information encoded in binary Low voltage = 0, High voltage = 1 One wire per bit Multi-bit data encoded on multi-wire buses Combinational circuit Operate on data Output is a function of input State (sequential) circuit Store information

Combinational Circuits AND-gate Y = A & B n Adder n Y = A + B A B + Y A B I0 I1 M u x S Y n MulLplexer n Y = S? I1 : I0 Y n ArithmeLc/Logic Unit n Y = F(A, B) A ALU Y B F Chapter 4 The Processor 8

Sequential Circuits Register: stores data in a circuit Uses a clock signal to determine when to update the stored value Edge-triggered: update when Clk changes from 0 to 1 D Clk Q Clk D Q

Edge-Triggered D Flip Flops D Q Value of D is sampled on posi9ve clock edge. Q outputs sampled value for rest of cycle. CLK D Q

Sequential Circuits Register with write control Only updates on clock edge when write control input is 1 Used when stored value is required later Clk D Write Clk Q Write D Q

Clocking Methodology Combinational logic transforms data during clock cycles Between clock edges Input from state elements, output to state element Longest delay determines clock period

Single cycle data paths Processor uses synchronous logic design (a clock ). f! T! 1 MHz! 1 μs! 10 MHz! 100 ns! 100 MHz! 10 ns! 1 GHz! 1 ns! All state elements act like posilve edge-triggered flip flops. D Q Reset? clk

Hardware Elements of CPU CombinaLonal circuits Mux, Decoder, ALU,... A 0 A 1 A n-1. Sel Mux lg(n) O A lg(n) Decoder. O 0 O 1 O n-1 A B OpSelect - Add, Sub,... - And, Or, Xor, Not,... - GT, LT, EQ, Zero,... ALU Result Comp? Synchronous state elements Flipflop, Register, Register file, SRAM, DRAM En Clk D ff Q Clk En D Q Edge-triggered: Data is sampled at the rising edge

Reads are combinalonal Register Files D 0 register D 1 D 2... D n-1 En Clk ff ff ff... ff Q 0 Q 1 Q 2... Q n-1 Clock WE ReadSel1 ReadSel2 WriteSel WriteData rs1 rs2 ws wd we Register file 2R+1W rd1 rd2 ReadData1 ReadData2 15

Register File Implementa9on RISC-V integer instruclons have at most 2 register source operands rd clk wdata rdata1 rdata2 5 32 32 32 reg 0 rs1 5 rs2 5 we reg 1 reg 31 16

A Simple Memory Model WriteEnable Clock Address WriteData MAGIC RAM ReadData Reads and writes are always completed in one cycle a Read can be done any Lme (i.e. combinalonal) a Write is performed at the rising clock edge if it is enabled => the write address and data must be stable at the clock edge Later in the course we will present a more realis:c model of memory 17

Five Stages of Instruc9on Execu9on Stage 1: InstrucLon Fetch Stage 2: InstrucLon Decode Stage 3: ALU (ArithmeLc-Logic Unit) Stage 4: Memory Access Stage 5: Register Write 18

Stages of Execu9on on Datapath PC instruction memory rd rs rt registers ALU Data memory +4 imm 1. InstrucLon Fetch 2. Decode/ Register Read 3. Execute 4. Memory 5. Register Write 19

Stages of Execu9on (1/5) There is a wide variety of instruclons: so what general steps do they have in common? Stage 1: InstrucLon Fetch The 32-bit instruclon word must first be fetched from memory the cache-memory hierarchy also, this is where we Increment PC PC = PC + 4, to point to the next instruclon: byte addressing so + 4 20

Stages of Execu9on (2/5) Stage 2: InstrucLon Decode: gather data from the fields (decode all necessary instruclon data) 1. read the opcode to determine instruclon type and field lengths 2. read in data from all necessary registers for add, read two registers for addi, read one register for jal, no reads necessary 21

Stages of Execu9on (3/5) Stage 3: ALU (ArithmeLc-Logic Unit): the real work of most instruclons is done here AL operalons: arithmelc (+, -, *, /), shiring, logic (&, ), comparisons (slt) loads and stores lw $t0, 40($t1) the address we are accessing in memory = the value in $t1 PLUS the value 40 AddiLon is done in this stage CondiLonal branch Comparison is done in this stage (one solulon) 22

Stages of Execu9on (4/5) Stage 4: Memory Access: only load and store instruclons the others remain idle during this stage or skip it all together since these instruclons have a unique step, we need this extra stage to account for them as a result of the cache system, this stage is expected to be fast 23

Stages of Execu9on (5/5) Stage 5: Register Write most instruclons write the result of some computalon into a register examples: arithmelc, logical, shirs, loads, slt what about stores, branches, jumps? don t write anything into a register at the end these remain idle during this firh stage or skip it all together 24

Stages of Execu9on on Datapath PC instruction memory rd rs rt registers ALU Data memory +4 imm 1. InstrucLon Fetch 2. Decode/ Register Read 3. Execute 4. Memory 5. Register Write 25

Instruction Execution PC instruction memory, fetch instruction Register numbers register file, read registers Depending on instruction class Use ALU to calculate Arithmetic result Memory address for load/store Branch condition and target address Access data memory for load/store PC target address or PC + 4

CPU Components

Multiplexers n Can t just join wires together n Use mullplexers

Control Signals

Building a Datapath Datapath Elements that process data and addresses in the CPU Registers, ALUs, mux s, memories, We will build a RISCV datapath incrementally Refining the overview design

Instruction Fetch 32-bit register Increment by 4 for next instruclon

R-Format Instructions Read two register operands Perform arithmetic/logical operation Write register result

Load/Store Instructions Read register operands Calculate address using 12-bit offset Use ALU, but sign-extend offset Load: Read memory and update register Store: Write register value to memory

Branch Instructions Read register operands Compare operands Use ALU, subtract and check Zero output Calculate target address Sign-extend displacement Shift left 2 places (word displacement) Add to PC + 4 Already calculated by instruction fetch

Branch Instructions Just re-routes wires Sign-bit wire replicated

Composing the Elements First-cut data path does an instruction in one clock cycle Each datapath element can only do one function at a time Hence, we need separate instruction and data memories Use multiplexers where alternate data sources are used for different instructions

R-Type/Load/Store Datapath

Full Datapath

ALU Control ALU used for Load/Store: F = add Branch: F = subtract R-type: F depends on funct field ALU control Function 0000 AND 0001 OR 0010 add 0110 subtract 0111 set-on-less-than 1100 NOR

ALU Control Assume 2-bit ALUOp derived from opcode Combinational logic derives ALU control opcode ALUOp Operation funct ALU function ALU control lw 00 load word XXXXXX add 0010 sw 00 store word XXXXXX add 0010 beq 01 branch equal XXXXXX subtract 0110 R-type 10 add 100000 add 0010 subtract 100010 subtract 0110 AND 100100 AND 0000 OR 100101 OR 0001 set-on-less-than 101010 set-on-less-than 0111

Datapath: Reg-Reg ALU Instruc9ons 0x4 Add RegWriteEn clk PC clk addr inst Inst. Memory Inst<19:15> Inst<24:20> Inst<11:7> we rs1 rs2 rd1 wa wd rd2 GPRs ALU Inst<14:12> ALU Control OpCode RegWrite Timing? 7 5 5 3 5 7 func7 rs2 rs1 func3 rd opcode rd (rs1) func (rs2) 31 25 24 20 19 15 14 12 11 7 6 0 41

Datapath: Reg-Imm ALU Instruc9ons 0x4 Add RegWriteEn clk PC clk addr inst Inst. Memory Inst<19:15> Inst<11:7> Inst<31:20> Inst<14:12> we rs1 rs2 rd1 wa wd rd2 GPRs Imm Select ALU Control ALU OpCode ImmSel 12 5 3 5 7 immediate12 rs1 func3 rd opcode rd (rs1) op immediate 31 20 19 15 14 12 11 7 6 0 42

Conflicts in Merging Datapath PC clk 0x4 Add addr inst Inst. Memory Inst<19:15> Inst<24:20> Inst<11:7> Inst<31:20> Inst<14:12> RegWrite clk we rs1 rs2 rd1 wa wd rd2 GPRs Imm Select ALU Control ALU Introduce muxes OpCode ImmSel 7 5 5 3 5 7 func7 rs2 rs1 func3 rd opcode rd (rs1) func (rs2) immediate12 rs1 func3 rd opcode rd (rs1) op immediate 31 20 19 15 14 12 11 7 6 0 43

Datapath for ALU Instruc9ons 0x4 Add RegWriteEn clk PC clk addr inst Inst. Memory <19:15> <24:20> <11:7> Inst<31:20> we rs1 rs2 rd1 wa wd rd2 GPRs Imm Select ALU <14:12> ALU Control <6:0> OpCode ImmSel Op2Sel Reg / Imm 7 5 5 3 5 7 func7 rs2 rs1 func3 rd opcode rd (rs1) func (rs2) immediate12 rs1 func3 rd opcode rd (rs1) op immediate 31 20 19 15 14 12 11 7 6 0 44

Load/Store Instruc9ons PC clk 0x4 Add addr inst Inst. Memory base disp RegWriteEn clk we rs1 rs2 rd1 wa wd rd2 GPRs Imm Select ALU Control ALU MemWrite clk we addr rdata Data Memory wdata WBSel ALU / Mem OpCode ImmSel Op2Sel 7 5 5 3 5 7 imm rs2 rs1 func3 imm opcode Store (rs1) + displacement immediate12 rs1 func3 rd opcode Load 31 20 19 15 14 12 11 7 6 0 rs1 is the base register rd is the destination of a Load, rs2 is the data source for a Store 45

RISC-V Condi9onal Branches 7 imm 31 25 5 rs2 24 20 5 rs1 19 15 3 func3 14 12 5 imm 11 7 7 opcode 6 0 BEQ/BNE BLT/BGE BLTU/BGEU Compare two integer registers for equality (BEQ/BNE) or signed magnitude (BLT/BGE) or unsigned magnitude (BLTU/ BGEU) 12-bit immediate encodes branch target address as a signed offset from PC, in units of 16-bits (i.e., shir ler by 1 then add to PC). 46

Condi9onal Branches (BEQ/BNE/BLT/BGE/BLTU/BGEU) PCSel br RegWrEn MemWrite WBSel pc+4 0x4 Add Add clk PC clk addr inst Inst. Memory we rs1 rs2 rd1 wa wd rd2 GPRs Imm Select Br Logic ALU Control Bcomp? ALU clk we addr rdata Data Memory wdata OpCode ImmSel Op2Sel 47

Full RISCV1Stage Datapath PC JumpReg TargGen rs1 rs2 Branch CondGen br_eq? br_lt? br_ltu? RISC-V Sodor 1-Stage pc+4 jalr branch jump exception pc_sel PC +4 addr val Instruction Mem PC+4 data Inst ir[31:25], ir[11:7] ir[31:12] ir[31:20] ir[31:20] ir[31:12] Branch TargGen Jump TargGen IType Sign Extend SType Sign Extend UType PC Op2Sel ALU co-processor (CSR) registers wb_sel ir[11:7] wa rf_wen en ir[24:20] ir[19:15] addr Reg File data rs2 rs1 wd Reg File Op1Sel AluFun Decoder Control Signals rs2 rdata addr wdata Data Mem Note: for simplicity, the CSR File (control and status registers) and associated datapath is not shown Execute Stage mem_rw mem_val 48

Hardwired Control is pure Combina9onal Logic op code Equal? combinalonal logic ImmSel Op2Sel FuncSel MemWrite WBSel WASel RegWriteEn PCSel 49

ALU Control & Immediate Extension Inst<14:12> (Func3) Inst<6:0> (Opcode) + 0? ALUop FuncSel ( Func, Op, +, 0? ) Decode Map ImmSel ( IType 12, SType 12, UType 20 ) 50

Hardwired Control Table Opcode ImmSel Op2Sel FuncSel MemWr RFWen WBSel WASel PCSel ALU ALUi LW SW BEQ true BEQ false J JAL JALR * Reg Func no yes ALU rd IType 12 Imm Op no yes ALU rd pc+4 IType 12 Imm + no yes Mem rd pc+4 SType 12 Imm + yes no * * pc+4 SBType 12 * * no no * * SBType 12 * * no no * * * * * no no * * pc+4 pc+4 jabs * * * no yes PC X1 jabs * * * no yes PC rd rind br Op2Sel= Reg / Imm WBSel = ALU / Mem / PC WASel = rd / X1 PCSel = pc+4 / br / rind / jabs 51

Single-Cycle Hardwired Control clock period is sufficiently long for all of the following steps to be completed : 1. InstrucLon fetch 2. Decode and register fetch 3. ALU operalon 4. Data fetch if required 5. Register write-back setup Lme => tc > tifetch + trfetch + talu+ tdmem+ trwb At the rising edge of the following clock, the PC, register file and memory are updated 52

Implementa9on in Real Load-Store RISC ISAs designed for efficient pipelined implementalons Inspired by earlier Cray machines (CDC 6600/7600) RISC-V ISA implemented using Chisel hardware construclon language Chisel: h}ps://chisel.eecs.berkeley.edu/ Ge~ng started: h}ps://chisel.eecs.berkeley.edu/2.2.0/ge~ng-started.html Check resource page for slides and other info 53

Chisel in one slides Module IO Wire Reg Mem 54

UCB RISC-V Sodor h}ps://github.com/ucb-bar/riscv-sodor Single-cycle: h}ps://github.com/ucb-bar/riscv-sodor/tree/master/src/ rv32_1stage 55