RISC Architecture: Multi-Cycle Implementation Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail: viren@ee.iitb.ac.in Computer Organization & Architecture Lecture 13 (16 April 2013)
Example Processor MIPS subset MIPS Instruc,on Subset v Arithme,c and Logical Instruc,ons Ø add, sub, or, and, slt v Memory reference Instruc,ons Ø lw, sw v Branch Ø beq, j 16 Apr 2013 Computer Architecture@IIT Mandi 2
Instruction Format v simple instruc,ons, all 32 bits wide v very structured, no unnecessary baggage v only three instruc,on formats R I J op rs1 rs2 rd shmt op rs1 rd 16 bit address/data op 26 bit address funct v rely on compiler to achieve performance 16 Apr 2013 Computer Architecture@IIT Mandi 3
How Fast Can the Clock Be? If every instruc,on is executed in one clock cycle, then: Clock period must be at least 8ns to perform the longest instruc,on, i.e., lw. This is a single cycle machine. It is slower because many instruc,ons take less than 8ns but are s,ll allowed that much,me. Method of speeding up: Use mul,cycle datapath. 16 Apr 2013 Computer Architecture@IIT Mandi 4
Multicycle Datapath PC Addr. Data Memory Instr. reg. (IR) Mem. Data (MDR) Register file A Reg. B Reg. 4 ALU ALUOut Reg. One-cycle data transfer paths (need registers to hold data) 16 Apr 2013 Computer Architecture@IIT Mandi 5
Multicycle Datapath Requirements Only one ALU, since it can be reused. Single memory for instruc,ons and data. Five registers added: Instruc,on register (IR) Memory data register (MDR) Three ALU registers, A and B for inputs and ALUOut for output 16 Apr 2013 Computer Architecture@IIT Mandi 6
Multicycle Datapath PCSource IorD out MUX in1 PCWrite etc. PC in2 control Data Addr. Memory 26-31 to Control FSM Instr. reg. (IR) IRWrite 0-25 16-20 Mem. Data (MDR) 11-15 21-25 Register file RegDst RegWrite B Reg. A Reg. ALUSrcA ALUSrcB Shift left 2 28-31 4 ALU ALUOut Reg. MemRead MemWrite MemtoReg 0-5 0-15 Sign extend Shift left 2 ALU control ALUOp 16 Apr 2013 Computer Architecture@IIT Mandi 7
3 to 5 Cycles for an Instruction Step R-type Mem. Ref. Branch type J-type (4 cycles) (4 or 5 cycles) (3 cycles) (3 cycles) Instruc9on fetch IR Memory[PC]; PC PC+4 Instr. decode/ Reg. fetch Execu9on, addr. Comp., branch & jump comple9on ALUOut A op B A Reg(IR[21-25]); B Reg(IR[16-20]) ALUOut PC + (sign extend IR[0-15]) << 2 ALUOut A+sign extend (IR[0-15]) If (A= =B) then PC ALUOut PC PC[28-3 1] (IR[0-25]<<2) Mem. Access or R- type comple9on Memory read comple9on Reg(IR[11-1 5]) ALUOut MDR M[ALUout] or M[ALUOut] B Reg(IR[16-20]) MDR 16 Apr 2013 Computer Architecture@IIT Mandi 8
Cycle 1 of 5: Instruction Fetch (IF) Read instruc,on into IR, M[PC] IR Control signals used:» IorD = 0 select PC» MemRead = 1 read memory» IRWrite = 1 write IR Increment PC, PC + 4 PC Control signals used:» ALUSrcA = 0 select PC into ALU» ALUSrcB = 01 select constant 4» ALUOp = 00 ALU adds» PCSource = 00 select ALU output» PCWrite = 1 write PC 16 Apr 2013 Computer Architecture@IIT Mandi 9
Cycle 2 of 5: Instruction Decode (ID) R I J 31-26 25-21 20-16 15-11 10-6 5-0 opcode reg 1 reg 2 reg 3 shamt fncode opcode reg 1 reg 2 word address increment opcode word address jump Control unit decodes instruc,on Datapath prepares for execu,on R and I types, reg 1 A reg, reg 2 B reg» No control signals needed Branch type, compute branch address in ALUOut» ALUSrcA = 0 select PC into ALU» ALUSrcB = 11 Instr. Bits 0-15 shif 2 into ALU» ALUOp = 00 ALU adds 16 Apr 2013 Computer Architecture@IIT Mandi 10
Cycle 3 of 5: Execute (EX) R type: execute func,on on reg A and reg B, result in ALUOut Control signals used:» ALUSrcA = 1 A reg into ALU» ALUsrcB = 00 B reg into ALU» ALUOp = 10 instr. Bits 0-5 control ALU I type, lw or sw: compute memory address in ALUOut A reg + sign extend IR[0-15] Control signals used:» ALUSrcA = 1 A reg into ALU» ALUSrcB = 10 Instr. Bits 0-15 into ALU» ALUOp = 00 ALU adds 16 Apr 2013 Computer Architecture@IIT Mandi 11
Cycle 3 of 5: Execute (EX) I type, beq: subtract reg A and reg B, write ALUOut to PC Control signals used:» ALUSrcA = 1 A reg into ALU» ALUsrcB = 00 B reg into ALU» ALUOp = 01 ALU subtracts» If zero = 1, PCSource = 01 ALUOut to PC» If zero = 1, PCwriteCond = 1 write PC» Instruc9on complete, go to IF J type: write jump address to PC IR[0-25] shif 2 and four leading bits of PC Control signals used:» PCSource = 10» PCWrite = 1 write PC» Instruc9on complete, go to IF 16 Apr 2013 Computer Architecture@IIT Mandi 12
Cycle 4 of 5: Reg Write/Memory R type, write des,na,on register from ALUOut Control signals used:» RegDst = 1 Instr. Bits 11-15 specify reg.» MemtoReg = 0 ALUOut into reg.» RegWrite = 1 write register» Instruc9on complete, go to IF I type, lw: read M[ALUOut] into MDR Control signals used:» IorD = 1 select ALUOut into mem adr.» MemRead = 1 read memory to MDR I type, sw: write M[ALUOut] from B reg Control signals used:» IorD = 1 select ALUOut into mem adr.» MemWrite = 1 write memory» Instruc9on complete, go to IF 16 Apr 2013 Computer Architecture@IIT Mandi 13
Cycle 5 of 5: Reg Write I type, lw: write MDR to reg[ir(16-20)] Control signals used:» RegDst = 0 instr. Bits 16-20 are write reg» MemtoReg = 1 MDR to reg file write input» RegWrite = 1 read memory to MDR» Instruc9on complete, go to IF For an alternative method of designing datapath, see N. Tredennick, Microprocessor Logic Design, the Flowchart Method, Digital Press, 1987. 16 Apr 2013 Computer Architecture@IIT Mandi 14
1-bit Control Signals Signal name Value = 0 Value =1 RegDst Write reg. # = bit 16-20 Write reg. # = bit 11-15 RegWrite No action Write reg. Write data ALUSrcA First ALU Operand PC First ALU Operand Reg. A MemRead No action Mem.Data Output M[Addr.] MemWrite No action M[Addr.] Mem. Data Input MemtoReg Reg.File Write In ALUOut Reg.File Write In MDR IorD Mem. Addr. PC Mem. Addr. ALUOut IRWrite No action IR Mem.Data Output PCWrite No action PC is written PCWriteCond No action PC is written if zero(alu)=1 zero(alu) PCWriteCond PCWrite PCWrite etc. 16 Apr 2013 Computer Architecture@IIT Mandi 15
2-bit Control Signals Signal name Value Action ALUOp ALUSrcB PCSource 00 ALU performs add 01 ALU performs subtract 10 Funct. field (0-5 bits of IR ) determines ALU operation 00 Second input of ALU B reg. 01 Second input of ALU 4 (constant) 10 Second input of ALU 0-15 bits of IR sign ext. to 32b 11 Second input of ALU 0-15 bits of IR sign ext. and left shift 2 bits 00 ALU output (PC +4) sent to PC 01 ALUOut (branch target addr.) sent to PC 10 Jump address IR[0-25] shifted left 2 bits, concatenated with PC+4[28-31], sent to PC 16 Apr 2013 Computer Architecture@IIT Mandi 16
Control: Finite State Machine Start Clock cycle 1 Clock cycle 2 State 1 State 0 Instruction fetch Instruction decode and register fetch Clock cycles 3-5 FSM-M Memory access instr. FSM-R FSM-B FSM-J R-type instr. Branch instr. Jump instr. 16 Apr 2013 Computer Architecture@IIT Mandi 17
State 0: Instruction Fetch (CC1) PCSource=00 PC IorD=0 out MUX in1 PCWrite etc.=1 in2 control Data Addr. Memory 26-31 to Control FSM Instr. reg. (IR) IRWrite =1 0-25 16-20 Mem. Data (MDR) 21-25 Register file RegDst RegWrite B Reg. A Reg. ALUSrcA=0 Shift left 2 28-31 ALUSrcB=01 4 ALU Add ALUOut Reg. MemRead = 1 MemWrite MemtoReg 0-5 0-15 Sign extend Shift left 2 ALU control ALUOp =00 16 Apr 2013 Computer Architecture@IIT Mandi 18
State 0 Control FSM Outputs State0 Instruction fetch Start MemRead =1 ALUSrcA = 0 IorD = 0 IRWrite = 1 ALUSrcB = 01 ALUOp = 00 PCWrite = 1 PCSource = 00 State 1 Instruction decode/ Register fetch/ Branch addr. Outputs? 16 Apr 2013 Computer Architecture@IIT Mandi 19
State 1: Instr. Decode/Reg. Fetch/ Branch Address (CC2) PCSource IorD out MUX in1 PCWrite etc. PC in2 control Data Addr. Memory 26-31 to Control FSM Instr. reg. (IR) IRWrite 0-25 16-20 Mem. Data (MDR) 21-25 Register file RegDst RegWrite B Reg. A Reg. ALUSrcA=0 Shift left 2 28-31 ALUSrcB=11 4 ALU Add ALUOut Reg. MemRead MemWrite MemtoReg 0-5 0-15 Sign extend Shift left 2 ALU control ALUOp = 00 16 Apr 2013 Computer Architecture@IIT Mandi 20
State 1 Control FSM Outputs State0 Instruction fetch (IF) Start MemRead =1 ALUSrcA = 0 IorD = 0 IRWrite = 1 ALUSrcB = 01 ALUOp = 00 PCWrite = 1 PCSource = 00 State 1 Instruction decode (ID) / Register fetch / Branch addr. ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 Opcode = BEQ Opcode = J-type FSM-M FSM-R FSM-B FSM-J 16 Apr 2013 Computer Architecture@IIT Mandi 21
State 1 (Opcode = lw) FSM-M (CC3-5) CC4 PCSource MUX in1 PCWrite etc. PC IorD=1 out in2 control Addr. Data Memory 26-31 to Control FSM Instr. reg. (IR) IRWrite 0-25 16-20 Mem. Data (MDR) 21-25 RegWrite=1 Register file CC5 RegDst=0 B Reg. A Reg. ALUSrcA=1 Shift left 2 28-31 ALUSrcB=10 4 CC3 ALU Add ALUOut Reg. MemRead=1 MemWrite MemtoReg=1 0-5 0-15 Sign extend Shift left 2 ALU control ALUOp = 00 16 Apr 2013 Computer Architecture@IIT Mandi 22
State 1 (Opcode= sw) FSM-M (CC3-4) CC4 PCSource PC IorD=1 out MUX in1 PCWrite etc. in2 control Addr. Data Memory 26-31 to Control FSM CC4 Instr. reg. (IR) IRWrite 0-25 16-20 Mem. Data (MDR) 21-25 RegWrite Register file RegDst=0 B Reg. A Reg. ALUSrcA=1 Shift left 2 28-31 ALUSrcB=10 4 CC3 ALU Add ALUOut Reg. MemRead MemWrite=1 MemtoReg 0-5 0-15 Sign extend Shift left 2 ALU control ALUOp = 00 16 Apr 2013 Computer Architecture@IIT Mandi 23
FSM-M (Memory Access) From state 1 Opcode = lw or sw Read Memory data Opcode = lw Compute mem addrress ALUSrcA =1 ALUSrcB = 10 ALUOp = 00 Opcode = sw Write memory MemRead = 1 IorD = 1 Write register MemWrite = 1 IorD = 1 RegWrite = 1 MemtoReg = 1 RegDst = 0 To state 0 (Instr. Fetch) 16 Apr 2013 Computer Architecture@IIT Mandi 24
State 1(Opcode=R-type) FSM-R (CC3-4) PCSource IorD out MUX in1 PCWrite etc. PC in2 control Addr. Data Memory 26-31 to Control FSM Instr. reg. (IR) IRWrite 0-25 21-25 16-20 Mem. Data (MDR) RegWrite Register file A Reg. 11-15 CC3 RegDst=0 CC4 B Reg. ALUSrcA=1 Shift left 2 28-31 ALUSrcB=00 4 ALU ALUOut Reg. funct. code MemRead MemWrite MemtoReg=0 Sign extend Shift left 2 16 Apr 2013 Computer Architecture@IIT Mandi 25 0-5 0-15 ALU control ALUOp = 10
FSM-R (R-type Instruction) From state 1 Opcode = R-type ALU operation ALUSrcA =1 ALUSrcB = 00 ALUOp = 10 Write register RegWrite = 1 MemtoReg = 0 RegDst = 1 To state 0 (Instr. Fetch) 16 Apr 2013 Computer Architecture@IIT Mandi 26
State 1 (Opcode = beq ) FSM-B (CC3) IorD out MUX in1 PCWrite etc.=1 PC in2 control Addr. Data Memory 26-31 to Control FSM Instr. reg. (IR) IRWrite 21-25 16-20 Mem. Data (MDR) 11-15 0-25 Register file RegDst RegWrite A Reg. CC3 B Reg. If(zero) ALUSrcA=1 Shift left 2 28-31 ALUSrcB=00 4 ALU zero ALUOut Reg. subtract PCSource 01 MemRead MemWrite MemtoReg Sign extend Shift left 2 16 Apr 2013 Computer Architecture@IIT Mandi 27 0-5 0-15 ALU control ALUOp = 01
Write PC on zero zero=1 PCWriteCond=1 PCWrite PCWrite etc.=1 16 Apr 2013 Computer Architecture@IIT Mandi 28
FSM-B (Branch) From state 1 Opcode = beq Write PC on branch condition ALUSrcA =1 ALUSrcB = 00 ALUOp = 01 PCWriteCond=1 PCSource=01 Branch condition: If A B=0 zero = 1 To state 0 (Instr. Fetch) 16 Apr 2013 Computer Architecture@IIT Mandi 29
State 1 (Opcode = j) FSM-J (CC3) in1 PCWrite etc. PC IorD Dat out a MUX control in2 Addr. Memory MemRead MemWrite 26-31 to Control FSM Instr. reg. (IR) IRWrite 0-25 16-20 Mem. Data (MDR) MemtoReg 11-15 0-5 21-25 0-15 Register file RegDst CC3 RegWrite Sign extend A Reg. B Reg. ALUSrcA ALUSrcB Shift left 2 Shift left 2 28-31 4 ALU zero ALU control PCSource 10 ALUOut Reg. ALUOp 16 Apr 2013 Computer Architecture@IIT Mandi 30
Write PC zero PCWriteCond PCWrite=1 PCWrite etc.=1 16 Apr 2013 Computer Architecture@IIT Mandi 31
FSM-J (Jump) From state 1 Opcode = jump Write jump addr. In PC PCWrite=1 PCSource=10 To state 0 (Instr. Fetch) 16 Apr 2013 Computer Architecture@IIT Mandi 32
Control FSM 3 State 0 2 Instr. fetch/ adv. PC Start 1 Instr. decode/reg. fetch/branch addr. R B J Read memory data Write register lw 4 5 Compute memory addr. sw Write memory data 6 7 ALU operation Write register Write PC on branch condition 8 9 Write jump addr. to PC 16 Apr 2013 Computer Architecture@IIT Mandi 33
Control FSM (Controller) 6 inputs (opcode) Combinational logic 16 control outputs Present state Next state Reset Clock FF FF FF FF 16 Apr 2013 Computer Architecture@IIT Mandi 34
Designing the Control FSM Encode states; need 4 bits for 10 states, e.g., State 0 is 0000, state 1 is 0001, and so on. Write a truth table for combina,onal logic: Opcode Present state Control signals Next state 000000 0000 0001000110000100 0001................ Synthesize a logic circuit from the truth table. Connect four flip- flops between the next state outputs and present state inputs. 16 Apr 2013 Computer Architecture@IIT Mandi 35
Block Diagram of a Processor MemWrite MemRead Controller (Control FSM) ALUOp 2-bits ALU control Opcode 6-bits zero Overflow ALUSrcA ALUSrcB 2-bits PCSource 2-bits RegDst RegWrite MemtoReg IorD IRWrite PCWrite PCWriteCond funct. [0,5] ALUOp 3-bits Reset Clock Mem. Addr. Datapath (PC, register file, registers, ALU) Mem. write data Mem. data out 16 Apr 2013 Computer Architecture@IIT Mandi 36
Exceptions or Interrupts Condi,ons under which the processor may produce incorrect result or may hang. Illegal or undefined opcode. Arithme,c overflow, divide by zero, etc. Out of bounds memory address. EPC: 32- bit register holds the affected instruc,on address. Cause: 32- bit register holds an encoded excep,on type. For example, 0 for undefined instruc,on 1 for arithme,c overflow 16 Apr 2013 Computer Architecture@IIT Mandi 37
Implementing Exceptions PCWrite etc.=1 PC MUX out in1 in2 control 26-31 to Control FSM Instr. reg. (IR) 0 1 CauseWrite=1 Cause 32-bit register 8000 0180(hex) ALUSrcA=0 ALUSrcB=01 4 ALU ALU control Subtract PCSource 11 EPCWrite=1 EPC Overflow to Control FSM ALUOp =01 16 Apr 2013 Computer Architecture@IIT Mandi 38
How Long Does It Take? Again Assume control logic is fast and does not affect the cri,cal,ming. Major,me components are ALU, memory read/write, and register read/write. Time for hardware opera,ons, suppose Memory read or write 2ns Register read 1ns ALU opera,on 2ns Register write 1ns 16 Apr 2013 Computer Architecture@IIT Mandi 39
Single-Cycle Datapath R- type 6ns Load word (I- type) 8ns Store word (I- type) 7ns Branch on equal (I- type) 5ns Jump (J- type) 2ns Clock cycle,me = 8ns Each instruc,on takes one cycle 16 Apr 2013 Computer Architecture@IIT Mandi 40
Multicycle Datapath Clock cycle,me is determined by the longest opera,on, ALU or memory: Clock cycle,me = 2ns Cycles per instruc,on (CPI): lw 5 (10ns) sw 4 (8ns) R- type 4 (8ns) beq 3 (6ns) j 3 (6ns) 16 Apr 2013 Computer Architecture@IIT Mandi 41
CPI of a Computer k (Instructions of type k) CPI k CPI = k (instructions of type k) where CPI k = Cycles for instruction of type k Note: CPI is dependent on the instruction mix of the program being run. Standard benchmark programs are used for specifying the performance of CPUs. 16 Apr 2013 Computer Architecture@IIT Mandi 42
Example Consider a program containing: loads 25% stores 10% branches 11% jumps 2% Arithme,c 52% CPI = 0.25 5 + 0.10 4 + 0.11 3 + 0.02 3 + 0.52 4 = 4.12 for mul,cycle datapath CPI = 1.00 for single- cycle datapath 16 Apr 2013 Computer Architecture@IIT Mandi 43
Multicycle vs. Single-Cycle Performance ratio = Single cycle time / Multicycle time (CPI cycle time) for single-cycle = (CPI cycle time) for multicycle 1.00 8ns = = 0.97 4.12 2ns Single cycle is faster in this case, but remember, performance ratio depends on the instruction mix. 16 Apr 2013 Computer Architecture@IIT Mandi 44
Thank You 16 Apr 2013 Computer Architecture@IIT Mandi 45