The multicycle datapath. Lecture 10 (Wed 10/15/2008) Finite-state machine for the control unit. Implementing the FSM

Size: px

Start display at page:

Download "The multicycle datapath. Lecture 10 (Wed 10/15/2008) Finite-state machine for the control unit. Implementing the FSM"

Dorothy Rice
5 years ago
Views:

1 Lectre (Wed /5/28) Lab # Hardware De Fri Oct 7 HW #2 IPS programming, de Wed Oct 22 idterm Fri Oct 2 IorD The mlticycle path SrcA Today s objectives: icroprogramming Etending the mlti-cycle path lti-cycle performance em ress em Data em IR [3-26] [25-2] [2-6] [5-] [5-] Instrction RegDst Reg A 2 B 2 s Sign Shift etend left SrcB Zero Reslt Op Ot Sorce emtoreg 2 Finite-state machine for the nit Implementing the FS Op = BEQ Branch completion SrcA = SrcB = Op = = Zero Sorce = This can be translated into a state table; here are the first two states. Otpt (Control signals) Crrent Inpt Net State (Op) State em em IR Reg emto Reg IorD Dst Reg SrcA SrcB Op Sorce Instrction fetch and increment IorD = em = IR = SrcA = SrcB = Op = Sorce = = R-type R-type eection writeback fetch and Op = R-type SrcA = Reg = branch comptation SrcB = RegDst = Op = fnc emtoreg = SrcA = SrcB = Op = write Effective address Op = SW em = comptation IorD = SrcA = SrcB = Op = LW/SW Op = read write Reg = em = RegDst = IorD = Op = LW emtoreg = 3 Instr Reg Reg Reg BEQ R-type LW/S W Reg Branch compl R-type eecte Compte eff addr Yo can implement this the hard way. Represent the crrent state sing flip-flops or a. Find eqations for the net state and ( signal) otpts in terms of the crrent state and inpt (instrction word). Or yo can se the easy way. Stick the whole state table into a memory, like a RO. This wold be mch easier, since yo don t have to derive eqations.

2 Pitfalls of state machines As mentioned last time, we cold translate this state diagram into a state table, and then make a logic circit or stick it into a RO. This works pretty well for or small eample, bt designing a finite-state machine for a larger instrction set is mch harder. There cold be many states in the machine. For eample, some IPS instrctions need 2 stages to eecte in some implementations each of which wold be represented by a separate state. There cold be many paths in the machine. For eample, the DEC VA from 978 had nearly 3 opcodes... that s a lot of branching! There cold be many otpts. For instance, the Pentim Pro s integer path has 2 signals, and the floating-point path has 285 signals. Implementing and maintaining the nit for processors like these wold be a nightmare. Yo d have to work with large Boolean eqations or a hge state table. otivation for microprogramming Think of the nit s state diagram as a little program. Each state represents a command, or a set of signals that tells the path what to do. Several commands are eected seqentially. Branches may be taken depending on the instrction opcode. The state machine loops by retrning to the initial state. Why don t we invent a special langage for making the nit? We cold devise a more readable, higher-level notation rather than dealing directly with binary signals and state transitions. We wold design nits by writing programs in this langage. We will depend on a hardware or software translator to convert or programs into a circit for the nit. 5 6 A good notation is very sefl Instead of specifying the eact binary vales for each signal, we will define a symbolic notation that s easier to work with. As a simple eample, we might replace SrcB = with SrcB =. We can also create symbols that combine several signals together. Instead of IorD = em = IR = it wold be nicer to jst say something like Src icroinstrctions Net For the IPS mlticycle we cold define microinstrctions with eight fields. These fields will be filled in symbolically, instead of in binary. They determine all the signals for the path. There are only 8 fields becase some of them specify more than one of the 2 actal signals. A microinstrction corresponds to one eection stage, or one cycle. Yo can see that in each microinstrction, we can do something with the, file, memory, and program conter nits

3 Specifying operations Specifying and memory actions Src Net Src Net selects the operation. indicates addition for memory offsets or increments. Sb performs sorce comparisons for beq. Fnc denotes the eection of R-type instrctions. SRC is either or A, for the s first operand. SRC2, the second operand, can be one of for different vales. B for R-type instrctions and branch comparisons. The constant to increment the. Etend, the sign-etended constant field for. Etshift, the sign-etended, shifted constant. These correspond to the Op, SrcA and SrcB signals, ecept we se names like and not actal bits like. selects a file action. to read from s rs and rt of the instrction word. writes Ot into destination rd. DR saves DR into destination rt. chooses the memory nit s action. reads an instrction from address into IR. reads from address Ot into DR. writes B to address memory Ot. 9 Src Specifying actions Net determines what happens to the. sets to Ot, sed in incrementing the. -Zero writes Ot to only if the s Zero condition is tre. This is sed to complete a branch instrction. Net determines the net microinstrction to be eected. Seq cases the net microinstrction to be eected. retrns to the initial instrction fetch stage. Dispatch i is similar to a switch or case statement; it branches depending on the actal instrction word. The first stage, the microprogramming way Below are two lines of microcode to implement the first two mlticycle eection stages, instrction fetch and fetch. The first line, labelled, involves several actions. from memory address. Use the to compte +, and store it back in the. Contine on to the net seqential microinstrction. Src Etshift Seq Net Dispatch 2 3

4 The second stage Completing a beq instrction Src Etshift Seq Net Dispatch BEQ Src Net The second line implements the fetch stage. s rs and rt from the file. Pre-compte + (sign-etend(ir[5-]) << 2) for branches. Determine the net microinstrction based on the opcode of the crrent IPS program instrction. switch (opcode) { case : goto BEQ; case : goto Rtype; case 3: case 35: goto em; } Control wold transfer to this microinstrction if the opcode was beq. Compte A-B, to set the s Zero bit if A=B. Update with Ot (which contains the branch target from the previos cycle) if Zero is set. The beq is completed, so fetch the net instrction. The in the label BEQ reminds s that we came here via the first branch point ( dispatch table ), from the second eection stage. 3 Completing an arithmetic instrction Completing transfer instrctions Rtype Src Net What if the opcode indicates an R-type instrction? The first cycle here performs an operation on s A and B, based on the IPS instrction s fnc field. The net stage writes the otpt to rd from the IPS instrction word. We can then go back to the microinstrction, to fetch and eecte the net IPS instrction. em SW2 LW2 Src Net Dispatch 2 For both sw and lw instrctions, we shold first compte the effective memory address, A + sign-etend(ir[5-]). Another dispatch or branch distingishes between stores and loads. For sw, we store (from B) to the effective memory address. For lw we copy from the effective memory address to rt. In either case, we contine on to after we re done. 5 6

5 icroprogramming vs. programming icroinstrctions correspond to signals. They describe what is done in a single clock cycle. These are the most basic operations available in a processor. icroprograms implement higher-level IPS instrctions. IPS assembly langage instrctions are comparatively comple, each possibly reqiring mltiple clock cycles to eecte. Bt each comple IPS instrction can be implemented with several simpler microinstrctions. Similarities with assembly langage icrocode is intended to make nit design easier. We defined symbols like to replace binary signals. A translator can convert microinstrctions into a real nit. The translation is straightforward, becase each microinstrction corresponds to one set of vales. This sonds similar to IPS assembly langage! We se mnemonics like lw instead of binary opcodes like. IPS programs mst be assembled to prodce real machine code. Each IPS instrction corresponds to a 32-bit instrction word. 7 8 anaging compleity It looks like all we ve done is devise a new notation that makes it easier to specify signals. That s eactly right! It s all abot managing compleity. Control nits are probably the most challenging part of CPU design. Large instrction sets reqire large state machines with many states, branches and otpts. Control nits for mlticycle processors are difficlt to create and maintain. Applying programming ideas to hardware design is a sefl techniqe. Sitations when microprogramming is bad One disadvantage of microprograms is that looking p signals in a RO can be slower than generating them from simplified circits. Sometimes comple instrctions implemented in hardware are slower than eqivalent assembly programs written sing simpler instrctions Comple instrctions are sally very general, so they can be sed more often. Bt this also means they can t be optimized for specific operands or sitations. Some microprograms jst aren t written very efficiently. Bt since they re bilt into the CPU, people are stck with them (at least ntil the net processor pgrade)

How microcode is sed today odern CISC processors (like 86) se a combination of hardwired logic and microcode to balance design effort with performance.

6 How microcode is sed today odern CISC processors (like 86) se a combination of hardwired logic and microcode to balance design effort with performance. Control for many simple instrctions can be implemented in hardwired logic Less-sed or very comple instrctions are microprogrammed to make the design easier and more fleible. In this way, designers observe the first law of performance ake the common case fast! The VA was designed in 978 by Digital Eqipment Corporation. It has one of the most comple instrction sets ever. (Compiler technology wasn t very good back then, and they wanted to make assembly programming easier.) VS, the VA mltiser, clsterbased operating system, was designed by Dave Ctler, who was also in charge of Windows NT. The VA had a 32-bit processor, seven years before Intel s The cycle time was 2ns. 5Hz! All of this cost $2,. DEC VA The single-cycle path; what is the cycle time? Performance of a mlticycle implementation Let s assme the following delays for the major fnctional nits. Instrction address [3-] Instrction memory I [25-2] I [2-6] I [5 - ] I [5 - ] RegDst 2ns Reg 2 2 s Sign etend Shift left 2 Src Zero Reslt Op Src em address address Data memory em emtoreg ress em Data [3-26] [25-2] [2-6] [5-] [5-] Instrction 2ns A 2 B 2 s Sign Shift etend left Zero Reslt Ot

7 Comparing cycle times The clock period has to be long enogh to allow all of the reqired work to complete within the cycle. In the single-cycle path, the reqired work was jst the complete eection of any instrction. The longest instrction, lw, reqires ( ). So the clock cycle time has to be, for a 77Hz clock rate. For the mlticycle path, the reqired work is only a single stage. The longest delay is, for both the and the memory. So or cycle time has to be, or a clock rate of 333Hz. The file needs only 2ns, bt it mst wait an etra ns to stay synchronized with the other fnctional nits. The single-cycle cycle time is limited by the slowest instrction, whereas the mlticycle cycle time is limited by the slowest fnctional nit. Comparing instrction eection times In the single-cycle path, each instrction needs an entire clock cycle, or, to eecte. With the mlticycle CPU, different instrctions need different nmbers of clock cycles, and hence different amonts of time. A branch needs 3 cycles, or 3 = 9ns. Arithmetic and sw instrctions each reqire cycles, or 2ns. Finally, a lw takes 5 stages, or 5ns. We can make some observations abot performance already. Loads take longer with this mlticycle implementation, while all other instrctions are faster than before. So if or program doesn t have too many loads, then we shold see an increase in performance The gcc eample Let s assme the gcc instrction mi: Instrction Arithmetic Loads Stores Branches Freqency 8% 22% % 9% In a single-cycle path, all instrctions take to eecte. The average eection time for an instrction on the mlticycle processor works ot to 2.9ns. (8% 2ns) + (22% 5ns) + (% 2ns) + (9% 9ns) = 2.9ns The mlticycle implementation is faster in this case, bt not by mch. The speedp here is only 7.5%. This CPU is too simple Or eample instrction set is too simple to see large gains. All of or instrctions need abot the same nmber of cycles (3-5). The benefits wold be mch greater in a more comple CPU, where some instrctions reqire many more stages than others. For eample, the 886 has instrctions to psh all the s onto the stack in one shot (PUSHA). Pshing proceeds seqentially, by. Implementing this in a single-cycle path wold be foolish, since the instrction wold need a large amont of time to store each into memory. Bt the 886 and VA are mlticycle processors, so these comple instrctions don t slow down the cycle time or other instrctions. Also, recall the real discrepancy between memory speed and processor freqencies. / 2.9ns =

8 lticycle Wrap-p A mlticycle processor splits instrction eection into several stages, each of which reqires one clock cycle. Each instrction can be eected in as few stages as necessary. lticycle is more comple than the single cycle implementation Etra mltipleers and temporary s are needed. The nit mst generate seqences of signals. icroprogramming helps manage the compleity by aggregating signals into grops and sing symbolic names Jst like assembly is easier than machine code Net time, we begin or foray into pipelining. The mlticycle implementation makes a good lanch point. 29 8

Review Multicycle: What is Happening. Controlling The Multicycle Design

Review Multicycle: What is Happening. Controlling The Multicycle Design Review lticycle: What is Happening Reslt Zero Op SrcA SrcB Registers Reg Address emory em Data Sign etend Shift left Sorce A B Ot [-6] [5-] [-6] [5-] [5-] Instrction emory IR RegDst emtoreg IorD em em