Review lticycle: What is Happening Reslt Zero Op SrcA SrcB Registers Reg Address emory em Data Sign etend Shift left Sorce A B Ot [-6] [5-] [-6] [5-] [5-] Instrction emory IR RegDst emtoreg IorD em em Controlling The lticycle Design Reslt Zero Op SrcA SrcB Registers Reg Address emory em Data Sign etend Shift left Sorce A B Ot [-6] [5-] [-6] [5-] [5-] Instrction emory IR RegDst emtoreg IorD em em
Stage : Instrction fetch & increment IorD IR = em[] SrcA em Address emory em em Data IR [-6] [5-] [-6] [5-] [5-] Instrction emory RegDst Reg Registers Sign etend Shift left A B SrcB Zero Reslt Op = + Ot Sorce emtoreg Controls:, IorD, em, IR, SrcA==, SrcB==, Op==add, Sorce== Register File Devoting whole cycle only to read regs is a waste IorD SrcA em Address emory em em Data IR [-6] [5-] [-6] [5-] [5-] RegDst Reg Registers A B Zero Reslt Op Ot Sorce Instrction emory Sign etend Shift left SrcB emtoreg
Stage : Reg fetch & branch target IorD sorce s SrcA em Address emory em em Data IR [-6] [5-] [-6] [5-] [5-] RegDst Reg Registers A B Zero Reslt Op Ot Sorce Instrction emory Sign etend Shift left SrcB Compte branch target address emtoreg Controls: SrcA==, SrcB==, Op==add 5 Stage (beq): Branch completion IorD em Address emory em em Data IR [-6] [5-] [-6] [5-] [5-] RegDst Reg Registers A B SrcA Use the target address compted in stage Zero Reslt Op Ot Sorce Instrction emory Sign etend Shift left SrcB Check for eqality of contents emtoreg Controls: SrcA==, SrcB==, Op==sb, Sorce==, == 6
Finite-state machine for the nit Op = R-type SrcA = SrcB = Op = fnc R-type eection Reg = RegDst = emtoreg = R-type writeback Instrction fetch and increment IorD = em = IR = SrcA = SrcB = Op = Sorce = = Register fetch and branch comptation SrcA = SrcB = Op = Op = BEQ SrcA = SrcB = Op = = Zero Sorce = Effective address comptation Branch completion Op = SW em = IorD = emory write Op = LW/SW SrcA = SrcB = Op = Op = LW em = IorD = emory read Reg = RegDst = emtoreg = Register write All instrction are the same for stages and 7 Comparing instrction eection times In the single-cycle path, each instrction needs an entire clock cycle, or 8ns, to eecte With the mlticycle CPU, different instrctions need different nmbers of clock cycles A branch needs cycles, or ns = 6ns Arithmetic and sw instrctions each reqire cycles, or 8ns Finally, a lw takes 5 cycles, or ns We can make some observations abot performance already Loads take longer with this mlticycle implementation, while all other instrctions are faster than before. So if or program doesn t have too many loads, then we shold see an increase in performance. 8
The gcc eample Let s assme the gcc instrction mi Instrction Freqency Arithmetic 8% Loads % Stores % Branches 9% In a single-cycle path, all instrctions take 8ns The average eection time for an instrction on the mlticycle processor works ot to 8.6ns: (8% 8ns) + (% ns) + (% 8ns) + (9% 6ns) =.8 +. +.88 +. = 8.6ns The mlticycle implementation is actally slightly slower 9 Reconsider emory s Role emory is 5ns, implying single-cycle = ns implying a 9.6Hz clock rate For mlti-cycle w/cache, let the processor stall on a cache miss Keep ns cycle time or 5Hz clock rate Instrction eection for GCC 8.6 ns Consider eecting 9 instrctions w/ 6 memory references: 5ns * 6 = 5* 7 ns single-cycle = seconds for total of.5 sec mlti-cycle = memory time + instrction eection time =.5 + 8.6 seconds for total of 8. sec 5
Retrn:Finite-state machine for Op = R-type SrcA = SrcB = Op = fnc R-type eection Reg = RegDst = emtoreg = R-type writeback Instrction fetch and increment IorD = em = IR = SrcA = SrcB = Op = Sorce = = Register fetch and branch comptation SrcA = SrcB = Op = Op = BEQ SrcA = SrcB = Op = = Zero Sorce = Effective address comptation Branch completion Op = SW em = IorD = emory write Op = LW/SW SrcA = SrcB = Op = Op = LW em = IorD = emory read Reg = RegDst = emtoreg = Register write Recall: Implementing the FS FS can be translated into a state table; first states: Crrent State Inpt (Op) Instr X Fetch Reg BEQ Fetch Reg Fetch Reg Fetch Net State Reg Fetch Branch compl R-type R-type eecte LW/SW Compte eff addr IorDemR em ead Otpt (Control signals) IR Reg emto Dst Reg Reg SrcA SrcB Op Yo can implement this the hard way yo don t want to do this Represent the crrent state sing flip-flops or a. Find eqations for the net state and ( signal) otpts in terms of the crrent state and inpt (instrction word). Or yo can se the easy way. Stick the whole state table into a memory, like a RO This wold be mch easier, since yo don t have to derive eqations Sorce X X X X X X X X X X X X X X 6
otivation for microprogramming Think of the nit s state diagram as a program Each state represents a command, or a set of signals that tells the path what to do Several commands are eected seqentially Branches may be taken depending on the instrction opcode The state machine loops by retrning to the initial state We cold invent a special langage for the nit We cold devise a more readable, higher-level notation rather than dealing directly with binary signals and state transitions We wold design nits by writing programs in this langage We wold depend on a hardware or software translator to convert or programs into a circit for the nit A good notation is very sefl Instead of specifying the eact binary vales for each signal, we will define a symbolic notation that s easier to work with As a simple eample, we might replace SrcB = with SrcB =, meaning the constant We can also create symbols that combine several signals together. Instead of IorD = em = IR = it wold be nicer to jst say something like 7
icroinstrctions Label Src Src Register emory Net For the IPS mlticycle we cold define microinstrctions with eight fields. These fields will be filled in symbolically, instead of in binary They determine all the signals for the path. There are only 8 fields becase some of them specify more than one of the actal signals A microinstrction corresponds to one eection stage, or one cycle Yo can see that in each microinstrction, we can do something with the, file, memory, and program conter nits 5 Label Specifying operations Src Src Register emory Net selects the operation Add indicates addition for memory offsets or increments Sb performs sorce comparisons for beq Fnc denotes the eection of R-type instrctions SRC is either or A, for the s first operand SRC, the second operand, can be one of for different vales B for R-type instrctions and branch comparisons The constant to increment the Etend, the sign-etended constant field for mem refs Etshift, sign-etended, shifted constant for branch targets These correspond to the Op, SrcA and SrcB signals, ecept we se names like Add and not actal bits like 6. 8
Specifying and memory actions Label Src Src Register emory Net Register selects a file action to read from s rs and rt of the instrction word writes Ot into destination rd DR saves DR into destination rt emory chooses the memory nit s action reads an instrction from address into IR reads from address Ot into DR writes B to address memory Ot 7 Specifying actions Label Src Src Register emory Net determines what happens to the. sets to Ot, sed in incrementing the. -Zero writes Ot to only if the s Zero condition is tre. This is sed to complete a branch instrction. Net determines the net microinstrction to be eected. Seq cases the net microinstrction to be eected. Fetch retrns to the initial instrction fetch stage. Dispatch i is similar to a switch or case statement; it branches depending on the actal instrction word. 8 9
icroprogramming the first stage Below are two lines of microcode to implement the first two mlticycle eection stages, instrction fetch and fetch The first line, labeled Fetch, involves several actions from memory address Use to compte +, and retrn it to the Contine on to the net seqential microinstrction Label Src Src Register emory Net Fetch Add Seq Add Etshift Dispatch 9 The second stage Label Src Src Register emory Net Fetch Add Seq Add Etshift Dispatch The second line implements fetch stage s rs and rt from the file Pre-compte + (sign-etend(ir[5-]) << ) for branches Determine the net microinstrction based on the opcode of the crrent IPS program instrction switch (opcode) { case : goto BEQ; case : goto Rtype; case : case 5: goto em; }
Completing a beq instrction Label Src Src Register emory Net BEQ Sb A B -Zero Fetch Control wold transfer to this microinstrction if the opcode was beq Compte A-B, to set the s Zero bit if A=B Update with Ot (which contains the branch target from the previos cycle) if Zero is set The beq is completed, so fetch the net instrction The in the label BEQ reminds s that we came here via the first branch point ( dispatch table ), from the second eection stage Completing an arithmetic instrction Label Src Src Register emory Net Rtype fnc A B Seq Fetch When the opcode indicates an R-type instrction The first cycle performs an operation on s A and B, based on the IPS instrction s fnc field The net stage writes the otpt to rd from the IPS instrction word We can then go back to the Fetch microinstrction, to fetch and eecte the net IPS instrction
Completing transfer instrctions Label Src Src Register emory Net em Add A Etend Dispatch SW Fetch LW Seq DR Fetch For both sw, lw instrctions, we shold first compte the effective memory address, A + sign-etend(ir[5-]) Another dispatch or branch distingishes between stores and loads For sw, we store (from B) to the effective memory address For lw we copy from the effective memory address to rt In either case, we contine on to Fetch when done icroprogramming vs. programming icroinstrctions correspond to signals They describe what is done in a single clock cycle These are the most basic operations available in a processor icroprograms implement higher-level IPS instrctions IPS assembly langage instrctions are comparatively comple, each possibly reqiring mltiple clock cycles to eecte Bt each comple IPS instrction can be implemented with several simpler microinstrctions
Similarities with assembly langage icrocode is intended to make nit design easier We defined symbols like to replace binary signals A translator converts microinstrctions into a real nit The translation is straightforward, becase each microinstrction corresponds to one set of vales This sonds similar to IPS assembly langage! We se mnemonics like lw instead of binary opcodes like IPS programs mst be assembled to prodce real machine code Each IPS instrction corresponds to a -bit instrction word 5 anaging compleity It looks like all we ve done is devise a new notation that makes it easier to specify signals And that s eactly right! The isse is managing compleity Control nits are probably the most challenging part of CPU design Large instrction sets reqire large state machines with many states, branches and otpts Control nits for mlticycle processors are difficlt to create and maintain Applying programming ideas to hardware design is a sefl techniqe 6
Cases when microprogramming is bad One disadvantage of microprograms is that looking p signals in a RO can be slower than generating them from simplified circits Sometimes comple instrctions implemented in hardware are slower than eqivalent assembly programs written sing simpler instrctions Comple instrctions are sally very general, so they can be sed more often. Bt this also means they can t be optimized for specific operands or sitations Some microprograms jst aren t written very efficiently. Bt since they re bilt into the CPU, people are stck with them (at least ntil the net processor pgrade) 7 How microcode is sed today odern CISC processors (like 86) se a combination of hardwired logic and microcode to balance design effort with performance Control for many simple instrctions can be implemented in hardwired which can be faster than reading a microcode RO Less-sed or very comple instrctions are microprogrammed to make the design easier and more fleible (floats, divide) In this way, designers respect the first law of performance ake the common case fast! 8