ENGN64: Design of Computing Systems Topic 4: Single-Cycle Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring 24 [ material from Patterson & Hennessy and Harris ]
Processor organization (microarchitecture) Multiple implementations for a single architecture: Single-cycle Each instruction executes in a single cycle Multi-cycle Each instruction is broken up into a series of shorter steps Pipelined Each instruction is broken up into a series of steps Multiple instructions execute at once. Superscalar Multiple instructions fetched, decoded and executed simultaneously. pplication Software Operating Systems rchitecture Microarchitecture Logic Digital Circuits nalog Circuits Devices Physics programs device drivers instructions registers datapaths controllers adders memories ND gates NOT gates amplifiers filters transistors diodes electrons 2
Introduction n n n CPU performance factors n n Instruction count n Determined by IS and compiler CPI and Cycle time n Determined by CPU hardware We will examine a number of MIPS implementations n n simplified single-cycle version more realistic pipelined version Simple subset, shows most aspects n n n reference: lw, sw rithmetic/logical: add, sub, and, or, slt Control transfer: beq, j 3
rchitectural state Determines everything about a processor: PC 32 registers PC' PC 32 32 32 32 Instruction 5 5 5 32 2 3 WD3 WE3 Register File 2 32 32 32 32 Data WD WE 32 4
Single-Cycle MIPS Processor Datapath Control Fetch instruction @ PC Decode instruction Fetch Operands Execute instruction Store result Update PC 5
Single-Cycle Datapath: lw fetch First consider executing lw STEP : Fetch instruction PC' PC Instruction Instr 2 3 WD3 WE3 Register File 2 Data WD WE 6
Single-Cycle Datapath: lw register read STEP 2: Read source operands from register file PC' PC Instruction Instr 25:2 2 3 WD3 WE3 Register File 2 Data WD WE 7
Single-Cycle Datapath: lw immediate STEP 3: Sign-extend the immediate PC' PC Instruction Instr 25:2 2 3 WD3 WE3 Register File 2 Data WD WE 5: Sign Extend SignImm 8
Single-Cycle Datapath: lw address STEP 4: Compute the memory address LUControl 2: PC' PC Instruction Instr 25:2 2 3 WD3 WE3 Register File 2 Src SrcB LU Zero LUResult Data WD WE 5: Sign Extend SignImm 9
Single-Cycle Datapath: lw memory read STEP 5: Read data from memory and write it back to register file PC' PC Instruction Instr 25:2 2:6 2 3 WD3 RegWrite WE3 Register File 2 LUControl 2: Src SrcB LU Zero LUResult Data WD WE ReadData 5: Sign Extend SignImm
Single-Cycle Datapath: lw PC increment STEP 6: Determine the address of the next instruction PC' PC Instruction Instr 25:2 2:6 2 3 WD3 RegWrite WE3 Register File 2 LUControl 2: Src SrcB LU Zero LUResult Data WD WE ReadData 4 + PCPlus4 5: Sign Extend SignImm Result
Single-Cycle Datapath: sw Write data in rt to memory PC' PC Instruction Instr 25:2 2:6 2:6 2 3 WD3 RegWrite WE3 Register File 2 LUControl 2: Src SrcB LU Zero LUResult WriteData MemWrite Data WD WE ReadData 4 + PCPlus4 5: Sign Extend SignImm Result 2
Single-Cycle Datapath: R-type instructions Read from rs and rt Write LUResult to register file Write to rd (instead of rt) PC' PC Instruction Instr 25:2 2:6 RegWrite RegDst LUSrc LUControl 2: MemWrite MemtoReg varies 2 3 WD3 WE3 Register File 2 Src SrcB LU Zero LUResult WriteData Data WD WE ReadData 4 + PCPlus4 2:6 5: 5: Sign Extend WriteReg 4: SignImm Result 3
Single-Cycle Datapath: beq Determine whether values in rs and rt are equal Calculate branch target address: BT = (sign-extended immediate << 2) + (PC+4) PCSrc PC' PC Instruction Instr 25:2 2:6 RegWrite RegDst LUSrc LUControl 2: Branch MemWrite MemtoReg x x 2 3 WD3 WE3 Register File 2 Src SrcB LU Zero LUResult WriteData Data WD WE ReadData 4 + PCPlus4 2:6 5: 5: WriteReg 4: Sign Extend SignImm <<2 + PCBranch Result 4
Complete single cycle processor [without jumps] 5
LU control n LU used for n Load/Store: F = add n Branch: F = subtract n R-type: F depends on funct field LU control Function ND OR add subtract set-on-less-than NOR 6
LU control n ssume 2-bit LUOp derived from opcode n Combinational logic derives LU control opcode LUOp Operation funct LU function LU control lw load word XXXXXX add sw store word XXXXXX add beq branch equal XXXXXX subtract R-type add add subtract subtract ND ND OR OR set-on-less-than set-on-less-than 7
The main control unit n Control signals derived from instruction R-type Load/ Store Branch rs rt rd shamt funct 3:26 25:2 2:6 5: :6 5: 35 or 43 rs rt address 3:26 25:2 2:6 5: 4 rs rt address 3:26 25:2 2:6 5: opcode always read read, except for load write for R-type and load sign-extend and add 8
Main decoder Instruction Op 5: RegWrite RegDst lusrc Branch Mem-read MemWrite MemtoReg LUOp : R-type lw sw X X beq X X addi 9
Implementing jumps Jump 2 address n Jump uses word address n Update PC with concatenation of n Top 4 bits of old PC n 26-bit jump address n 3:26 25: n Need an extra control signal decoded from opcode 2
Datapath and control with jumps Instruction Op 5: RegWrite RegDst lusrc Branch MemWrite MemtoReg LUOp : Jump j X X X X XX 2
Processor performance Program Execution Time = (# instructions)(cycles/instruction)(seconds/cycle) = # instructions x CPI x T C CPI = What is T C? 22
Critical path T C is limited by the critical path (lw) 3:26 5: MemtoReg Control MemWrite Unit Branch LUControl 2: Op LUSrc Funct RegDst RegWrite PCSrc PC' PC 4 Instruction + PCPlus4 Instr 25:2 2:6 2:6 5: 5: 2 3 WD3 WE3 Register File 2 WriteReg 4: Sign Extend SignImm Src SrcB <<2 Zero LU + LUResult WriteData PCBranch Data WD WE ReadData Single-cycle critical path: T c = t pcq_pc + t mem + max(t RFread, t sext + t mux ) + t LU + t mem + t mux + t RFsetup Result 23
Critical path delay Element Parameter Delay (ps) Register clock-to-q t pcq_pc 3 Register setup t setup 2 Multiplexer t mux 25 LU t LU 2 read t mem 25 Register file read t RFread 5 Register file setup t RFsetup 2 T c = t pcq_pc + 2t mem + t RFread + t mux + t LU + t RFsetup = [3 + 2(25) + 5 + 25 + 2 + 2] ps = 925 ps 24
Performance issues n Longest delay determines clock period n Critical path: load instruction n Instruction memory register file LU data memory register file n Not feasible to vary period for different instructions n Violates design principle n Making the common case fast 25
Summary Single-cycle processor design is simple Plenty of room for improvement: Pipelining Superscalar In Lab4 you will design, implement and boot your first processor! 26