8 447 Lecture 3: Single Cycle Microarchitecture James C. Hoe Department of ECE Carnegie Mellon University 8 447 S8 L03 S, James C. Hoe, CMU/ECE/CALCM, 208
Your goal today Housekeeping first try at implementing the RV32I ISA Notices Handout #4: HW, due 2/7 Student survey on Canvas, past due Lab, Part A, due week of /29 Lab, Part B, due week of 2/5 ings P&H Ch 4.~4.4 finish P&H Ch 2 for next time 8 447 S8 L03 S2, James C. Hoe, CMU/ECE/CALCM, 208
Processing FSM I S Next S O An ISA describes an abstract FSM state = program visible state next state logic = instruction execution Nice ISAs have atomic instruction semantics one state transition per instruction in abstract FSM The implementation FSM can look wildly different 8 447 S8 L03 S3, James C. Hoe, CMU/ECE/CALCM, 208
Program Visible State (aka Architectural State) PC 5 5 5 register register 2 Registers register 2 Reg Mem address Address Data Mem 8 447 S8 L03 S4, James C. Hoe, CMU/ECE/CALCM, 208 **Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
Magic Memory and Register File Combinational output of the read port is a combinational function of the register file contents and the corresponding read select port Synchronous write the selected register (or location) is updated on the posedge clock transition when write enable is asserted Cannot affect read output in between clock edges 8 447 S8 L03 S5, James C. Hoe, CMU/ECE/CALCM, 208
Simplifying Characteristics of RISC Simple operations 2 input, output arithmetic and logical operations few alternatives for accomplishing the same thing Simple movements ALU ops are register to register, never load store architecture, addressing mode Simple branches limited varieties of branch conditions and targets PC offset Simple instruction encoding all instructions encoded in the same number of bits simple, fixed formats 8 447 S8 L03 S6, James C. Hoe, CMU/ECE/CALCM, 208 (RISC=Reduced Set Computer)
RISC Processing 5 generic steps instruction fetch instruction decode and operand fetch ALU/execute access (not required by non mem instructions) IF write back Data Register # PC Address Registers ALU Register # ID Register # EX WB Address Data Data MEM 8 447 S8 L03 S7, James C. Hoe, CMU/ECE/CALCM, 208 **Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
Single Cycle Datapath for RV32I ALU s 8 447 S8 L03 S8, James C. Hoe, CMU/ECE/CALCM, 208
Register Register ALU s Assembly (e.g., register register addition) ADD rd, rs, rs2 Machine encoding 0000000 7 bit Semantics GPR[rd] GPR[rs] + GPR[rs2] PC PC + 4 Exceptions: none (ignore carry and overflow) Variations Arithmetic: {ADD, SUB} Compare: {signed, unsigned} x {Set if Less Than} Logical: {AND, OR, XOR} Shift: {Left, Right Logical, Right Arithmetic} 8 447 S8 L03 S9, James C. Hoe, CMU/ECE/CALCM, 208 rs2 5 bit rs 5 bit 000 3 bit rd 5 bit 000 7 bit
ADD rd rs rs2 PC address 5 5 5 register register 2 Registers register 2 Reg if MEM[PC] == ADD rd rs rs2 GPR[rd] GPR[rs] + GPR[rs2] PC PC + 4 8 447 S8 L03 S0, James C. Hoe, CMU/ECE/CALCM, 208 IF ID EX MEM WB Combinational state update logic
R Type ALU Datapath 4 Add func3, func7 PC address [9:5] [24:20] [:7] register register 2 Registers register 2 3 ALU operation Zero ALU ALU result Reg **Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] 8 447 S8 L03 S, James C. Hoe, CMU/ECE/CALCM, 208
Reg Immediate ALU s Assembly (e.g., reg immediate additions) ADDI rd, rs, imm 2 Machine encoding imm[:0] 2 bit Semantics GPR[rd] GPR[rs] + sign extend (imm) PC PC + 4 Exceptions: none (ignore carry and overflow) Variations Arithmetic: {ADDI, SUBI} Compare: {signed, unsigned} x {Set if Less Than Imm} Logical: {ANDI, ORI, XORI} **Shifts by unsigned imm[4:0]: {SLLI, SRLI, SRAI} 8 447 S8 L03 S2, James C. Hoe, CMU/ECE/CALCM, 208 rs 5 bit 000 3 bit rd 5 bit 0000 7 bit
ADDI rd rs immediate 2 Add 4 PC address 5 5 5 register register 2 Registers register 2 Add Sum Reg if MEM[PC] == ADDI rd rs immediate GPR[rd] GPR[rs] + sign extend (immediate) PC PC + 4 8 447 S8 L03 S3, James C. Hoe, CMU/ECE/CALCM, 208 IF ID EX MEM WB Combinational state update logic
Datapath for R and I type ALU Inst s Add PC address 4 [9:5] [24:20] [:7] register register 2 Registers register 2 opcode, func3, func7, 3 ALU operation Zero ALU ALU result Reg [3:20] 6 Sign 32 2 extend ALUSrc isitype **Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] 8 447 S8 L03 S4, James C. Hoe, CMU/ECE/CALCM, 208
Single Cycle Datapath for Data Movement s 8 447 S8 L03 S5, James C. Hoe, CMU/ECE/CALCM, 208
Load s Assembly (e.g., load 4 byte word) LW rd, offset 2 (base) Machine encoding offset[:0] 2 bit Semantics byte_address 32 = sign extend(offset 2 ) + GPR[base] GPR[rd] MEM 32 [byte_address] PC PC + 4 Exceptions: none for now Variations: LW, LH, LHU, LB, LBU e.g., LB :: GPR[rd] sign extend(mem 8 [byte_address]) LBU :: GPR[rd] zero extend(mem 8 [byte_address]) Note: RV32I is byte addressable, little endian 8 447 S8 L03 S6, James C. Hoe, CMU/ECE/CALCM, 208 base 5 bit 00 3 bit rd 5 bit 00000 7 bit rs
LW Datapath Add PC address 4 register register 2 Registers register 2 add 3 ALU operatio Zero ALU ALU result Address Mem Data Reg 6 Sign 32 2 extend ALUSrc isitype Mem if MEM[PC]==LW rd offset 2 (base) EA = sign extend(offset) + GPR[base] GPR[rd] MEM[ EA ] PC PC + 4 8 447 S8 L03 S7, James C. Hoe, CMU/ECE/CALCM, 208 IF ID EX MEM WB Combinational state update logic
8 447 S8 L03 S8, James C. Hoe, CMU/ECE/CALCM, 208 Store s Assembly (e.g., store 4 byte word) SW rs2, offset 2 (base) Machine encoding offset[:5] 7 bit rs2 5 bit base 5 bit 00 3 bit ofst[4:0] 5 bit 0000 7 bit Semantics byte_address 32 = sign extend(offset 2 )+ GPR[base] MEM 32 [byte_address] GPR[rs2] PC PC + 4 Exceptions: none for now Variations: SW, SH, SB e.g., SB:: MEM 8 [byte_address] (GPR[rs2])[7:0]
SW Datapath Add PC address 4 register register 2 Registers register 2 add 3 ALU operatio Zero ALU ALU result Address Mem Data Reg 0 6 Sign 32 2 extend ALUSrc isstype Mem if MEM[PC]==SW rs2 offset 2 (base) EA = sign extend(offset) + GPR[base] MEM[ EA ] GPR[rs2] PC PC + 4 8 447 S8 L03 S9, James C. Hoe, CMU/ECE/CALCM, 208 IF ID EX MEM WB Combinational state update logic
Load Store Datapath Add PC address 4 register register 2 Registers register Reg!isStore 2 6 Sign 32 32 extend add 3 ALU operation Zero ALU ALU result ALUSrc isitype isstype Address isstore Mem Data isload Mem extend ImmExtend {Itype, ItypeU, Stype } LoadExtend {W, H, HU, B, BU} 8 447 S8 L03 S20, James C. Hoe, CMU/ECE/CALCM, 208 **Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
Datapath for Non Control Flow Inst s Add PC address 4 register register 2 Registers register Reg!isStore 2 6 Sign 32 32 extend opcode, func3, func7, 3 ALU operation Zero ALU ALU result ALUSrc isitype isstype Address isstore Mem Data isload Mem extend 8 447 S8 L03 S2, James C. Hoe, CMU/ECE/CALCM, 208 ImmExtend MemtoReg isload LoadExtend **Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
Single Cycle Datapath for Control Flow s 8 447 S8 L03 S22, James C. Hoe, CMU/ECE/CALCM, 208
Jump Assembly JAL rd imm 2 Machine encoding Note: implicit imm[0]=0 imm[20 0: 9:2] 20 bit rd 5 bit 0 7 bit Semantics target =PC + sign extend(imm 2 ) GPR[rd] PC + 4 PC target How far can you jump? Exceptions: misaligned target (4 byte) UJ type *Note*: use JAL x0 label instead of BEQ x0 x0 label 8 447 S8 L03 S23, James C. Hoe, CMU/ECE/CALCM, 208
Unconditional Jump Datapath isj PCSrc ADD PC address 4 UJ immediate 32 **Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] Add PC+4 PCtoReg isj register register 2 Registers register Reg 2 6 Sign 32 32 extend 3 X Zero ALU ALU result ALUSrc ALU operation Address 0 Mem Data 0 Mem extend if MEM[PC]==JAL rd, immediate 20 GPR[rd] = PC +4 PC =PC + sign extend(imm 2 ) 8 447 S8 L03 S24, James C. Hoe, CMU/ECE/CALCM, 208 **Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] ImmExtend MemtoReg LoadExtend X X X What about JALR?
(Conditional) Branch s Assembly (e.g., branch if equal) BEQ rs, rs2, imm 3 Note: implicit imm[0]=0 Machine encoding imm[2 0:5] 7 bit Semantics target = PC + sign extend(imm 3 ) if GPR[rs]==GPR[rs2] then PC target else PC PC + 4 How far can you jump? Exceptions: misaligned target (4 byte) if taken Variations BEQ, BNE, BLT, BGE, BLTU, BGEU 8 447 S8 L03 S25, James C. Hoe, CMU/ECE/CALCM, 208 rs2 5 bit rs 5 bit 000 3 bit imm[4: ] 5 bit 000 7 bit
JAL and taken Branch PC+4 Conditional Branch Datapath JALR PCSrc PC address 8 447 S8 L03 S26, James C. Hoe, CMU/ECE/CALCM, 208 4 Add **Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] PC + 4 from instruction path PC register register 2 Registers register Reg 0 2 6 Sign 32 32 extend Shift left 2 Add 3 Sum ALU operation bcond ALU ALU Zero Result ALUSrc isitype isstype isjalr Branch target sub (when Bxx) bcond? To branch control logic ImmExtend ={Itype, ItypeU, Stype, SBtype, Utype, UJtype}
Adding Control to Datapath 8 447 S8 L03 S27, James C. Hoe, CMU/ECE/CALCM, 208 [Figure 4.7 from book, Copyright 208 Elsevier Inc. All rights reserved.]
Datapath Control Generation MEM[PC] Decode Logic ALUSrc Reg MemtoReg PCtoReg Mem Mem ALU Op ImmExtend LoadExtend PCSrc 8 447 S8 L03 S28, James C. Hoe, CMU/ECE/CALCM, 208
Single Bit Control Signals When De asserted When asserted Equation ALUSrc 2 nd ALU input from 2 nd GPR read port 2 nd ALU input from immediate (opcode!=isrtype) && (opcode!=isbtype) Reg GPR write disabled GPR write enabled MemtoReg Steer ALU result to GPR write port steer load to GPR write port (opcode!=sw) && (opcode!=bxx) opcode==lw/h/b PCtoReg Steer above result to GPR write port Steer PC+4 to GPR write port (opcode==jal) II (opcode==jalr) Mem Memory read disabled Memory read port return load value opcode==lw/h/b Mem Memory write disabled Memory write enabled opcode==sw/h/b 8 447 S8 L03 S29, James C. Hoe, CMU/ECE/CALCM, 208
Multi Bit Control Signals Options Equation ALU Op ADD, SUB, AND, OR, XOR, NOR, LT, and Shift bcond: EQ, NE, GE, LT ImmExtend PCSrc Itype, ItypeU, Stype, SBtype, Utype, UJtype PC+4, PCadder, ALU LoadExtend W,H,HU,B,BU case func3.... 8 447 S8 L03 S30, James C. Hoe, CMU/ECE/CALCM, 208 case opcode RTypeALU: according to funct3, funct7[5] ITypeALU : according to funct3 only (except shift) LW/SW/JALR : ADD Bxx : SUB and select bcond function : pass through 2 nd select based on instruction format type (may want to have separate extension units for primary ALU and PC offset adder) case opcode JAL : PC + immediate JALR : GPR + immediate Bxx : taken?(pc + immediate):(pc + 4) : PC+4
architecture Architecture Architecture vs Microarchitecture Architectural Level a clock has a hour hand and a minute hand,... a computer does.????. You can read a clock without knowing how it works Microarchitecture Level a particular clockwork has a certain set of gears arranged in a certain configuration a particular computer design has a certain path and a certain control logic Realization Level machined alloy gears vs stamped sheet metal CMOS vs ECL vs vacuum tubes 8 447 S8 L03 S3, James C. Hoe, CMU/ECE/CALCM, 208 conceptual physical [Computer Architecture, Blaauw and Brooks, 997]