Department of CSE- Mahalakshmi Engineering College Page 1

Size: px

Start display at page:

Download "Department of CSE- Mahalakshmi Engineering College Page 1"

Ronald Todd
5 years ago
Views:

1 SUB NAME: COMPUTER ARCHITECTURE AND ORGANIZATION BRANCH: ECE SUB CODE: EC2303 YEAR/SEM: III / V UNIT-III-CONTROL DESIGN PART-A 1. What are the advantages and disadvantages of hard wired and micro programmed control? (AUCNOV 07) Hardwire control Advantages: Hardwire control works fast. Combinational circuit generates the control signal based on input signal status. Disadvantages: The control unit design becomes very complex. Microprogrammed control Advantages: The design of microprogram is less complex. The microprogram is flexible. Disadvantages: A microprogrammed CPU is slow. Expensive. 2. What is micro program? (AUCNOV 09) Microinstruction also explicitly are implicitly specifies the next microinstruction to be used there by proiding the necessary information for micro-operation sequencing. A set of related microinstruction forms a microprogram. 3. State the difference between hardwired control and micro programmed control unit. (AUC MAY 07,APR 11) S.No Hardwired control Micro programmed control 1 Implemented in hardware Implemented in software 2 Digital circuits generate the control signals. The control signals are stored as bit pattern in a ROM 3 It is conventional design technique It is modern design technique. Department of CSE- Mahalakshmi Engineering College Page 1

2 4. What is hardwired control? A hardwire control unit consists of combinational circuit to generate various control signals. This scheme is used to minimize the cost while achieving for higher efficiency in terms of operation speed. 5. What is micro instruction and what factor should determine the length of the microinstruction s? (AUC APR 11) Every instruction in the CPU is implemented bu a sequence of one or more sets of concurrent micro- operations.each micro operations is associated with a specific set of control lines when activated, causes micro-operation to takes place. The size of the microinstruction should be minimum so that the size of the control memory required to store microinstruction is less. 6. What is the drawback of assigning one bit to each control signal?(auc NOV 12) The serious drawback for assigning individual bit to each control signal results in long microinstructions,because the no of required signal usually large. 7. What is called static branch prediction and dynamic branch prediction.(auc NOV 12) A branch prediction is same every time a given instruction is executed. The prediction decision may change depending on exeution history is called dynamic branch prediction. 8. Define Control word (CW) & Control store. The micro routines for all instructions in the instruction set of a computer are stored in a special memory is called the control store. A control word is a word whose individual bits represent the various control signals. 9. Draw the structure of hard wired control unit. Department of CSE- Mahalakshmi Engineering College Page 2

3 10. Draw the structure of micro programmed control unit. 11.What is a processor control clock? All the operations and data transfers with in the processor take place within time periods defined by a clock cycle called as a processor clock. 12.What is meant by Nano programming? It reduces the control bits to require to interpret an instruction set. 13.What is the purpose of WMFC signal? WMFC is a control signal that causes the processors control circuitry to wait for the arrival of the MFC signal. 14.What is micro routine? Sequences of control words corresponding to the control sequence of machine instruction constitute the microroutine. 15. What is pipelining? Pipelining is a process of executing machine instructions concurrently.it s also called as assembly line operation. 16. Define hazard. Any condition that causes the pipeline to stall is called as hazard. Department of CSE- Mahalakshmi Engineering College Page 3

4 PART-B 1.Explain the Organization of Hardwired control in detail. (AUC NOV 06) For n sequences of steps, assume that the states are changed in each step from one state ts, I0 to Ix 1s of steps, assume that the states are changed in each step from one state to the next from S0 to Sn 1 At each state, the set of outputs, Oi,0 to Oi, m 1,depend on the current states and inputs.a set of outputs for the given set of inputs,assuming that the control unit is generatingstates by its circuit, which is based on Moorestate machine concept Moore state machine concept is that each output signal Oi,0 to Oi, m 1 depends on the current state onlyx 1 Hardwired Control Unit Outputs O i, 0 to O i, m 1 in sequentially generated states The outputs Oi,0 to Oi, m 1 in case of each state 3 types of control signals 1. Select function signals, f1, f2,..,fy 2, fy 1 2. Select storage units, r1, r2,..,rz 2, rz 1 3. Select data route signals, d1, d2,..,dr 2, dr 1 Department of CSE- Mahalakshmi Engineering College Page 4

5 Moore state machine ith set of outputs forgiven set of inputs in the ith state Hardwired control unit organization using a sequence counter from 0 to n 1 Encoder to generate states with outputs Oi, 0to Oi, m 1 in sequentially generated stateshardwired Control Unit Organization Department of CSE- Mahalakshmi Engineering College Page 5

6 An encoder takes input from a sequence counter, instruction decoder, condition/status flags and external inputs External inputs can be from an external interrupting source and an external device requesting access to external buses Condition/status flags are as per conditions/ status flags set in earlier instructions Control Unit Organization The encoder sends a reset signal after the end of an instruction and a stop signal to the sequencer after the last sequence The encoder also sends count start signal to let the clock increment the counter during processing of an instruction Three sets of control signals that are outputs in various states. Control Unit Outputs (1) Select function signals, f1, f2,..,fy 2, fy 1 (2) select storage units, r1, r2,..,rz 2, rz 1 (3) Select data route signals, d1, d2,..,dr 2, dr 1 Operations by the control unit using encoder Output control signals in first two steps (i) Step j: PC MAR. (ii) Step j + 1: PC PC + 4 for 32-bits memory word alignments The sequence counter Resets when an instruction transfers to ID after decoding and starts on encoder activating count signal Let c0, c1,..cn 1 be the sequence counter output from the instance activation of the count output of encoder r1 to r12 storing unit control signals r1. Storing unit PC output control r2. Storing unit PC input control r3. Storing unit MAR output control r4. Storing unit MAR input control r5. Constant 4 storing unit output control Department of CSE- Mahalakshmi Engineering College Page 6

7 r6. Constant 4 storing unit input control r1 to r12 storing unit control signals r7. Input for arithmetic unit X output control r8. Input for arithmetic unit X input control r9. Input for arithmetic unit Y output control r10. Input for arithmetic unit Y input control r11. Arithmetic unit output Z input control r12. Z output control d1 to d4 the data route select control signals d1. Data route internal bus control d2. Data route external address bus output control d3. Data route external data bus output control d4. Data route external data bus input control Step j: PC MAR implements by a control signal r4 c0.f1.d1.r1. Step j + 1: PC PC + 4 implements in the next sequence by a control signal r2 c1.f1.d1.r6.r5.r8.r7.r1. r10.r9.f2.r11.r12. 2.Explain the Organization of Micro programmed control unit (AUC NOV 07, 06,MAY 06,NOV 06,NOV 11, 12). Microprogramming is an orderly method of designing the control unit of a conventional computer The term microprogramming is based on the analogy between sequence of transfer required to execute a machine instruction and the sequence of individual instructions in conventional user program. Each step is called microinstruction and complete set of steps required to process a machine instruction is called the micro program. Department of CSE- Mahalakshmi Engineering College Page 7

8 Department of CSE- Mahalakshmi Engineering College Page 8

9 Microcode Execution: 1.Op-code is decoded. 2. Microinstructions are retrieved from control memory (control address register and the decoder serve as the address register and selection mechanism or control unit). 3. The control address register locates the microinstruction to be retrieved from control memory. 4. The microinstruction register holds the retrieved microinstruction - micro opcode and address of the next microinstruction in the control memory 5. Current microinstruction is executed. 6. The address of the next microinstruction is entered into the control memory to retrieve the next microinstruction. 7. If all microinstructions were executed, then store next op-code of conventional instruction in the control address register, if not, execute remaining microinstruction. 8. Conditional jumps are implemented by letting the states of some conditional flip-flops modify the address of the next microinstruction to be retrieved. VARIATIONS IN WILKES' IMPLEMENTATION 1. Microinstruction address selection. 2. Instruction set strategies (control word organization). 1) The control address register is modified to function as a counter when the next microinstruction follows the current address and as a register when a branch is required (similar to Wilkes' scheme). - control address acts as a counter or a register 2) Addition of a register to hold the address of the next microinstruction (μpc). 3) Control memory may be organized in two storage matrices each with its own decoding tree, thus providing faster operation at lower cost, since the size of the decoder is reduced. Department of CSE- Mahalakshmi Engineering College Page 9

10 Department of CSE- Mahalakshmi Engineering College Page 10

11 Next address of microcode execution may be specified by: 1) Address field of the current microinstruction 2) μpc (microprogam counter) 3) Address from address control store Microinstruction formats: 1) No encoding 2) Some encoding 3) Complete encoding ADVANTAGES & APPLICATIONS OF MICROPROGRAMMING 1) THE SYSTEMATIZATION OF CONTROL 2) IMPROVEMENT IN PERFORMANCE a) a high degree of parallelism in data paths e.g., multiple bit microinstructions are performed in one cycle b) a high degree of decision logic (in table search and sorting routines) 3) COMPUTER-SERIES COMPATIBILITY Compatibility of instruction sets between smaller and larger machines of a series, e.g., Intel286, 386, Pentium, IBM Systems/309x, Motorola series 4) EMULATION Emulation is the combined software/hardware interpretation of the machine instruction of one machine by another. Target s machine architecture is mapped onto the host machine. EMULATOR - a set of microprograms that interpret a particular instruction set or language L1. Computer C1 emulates computer C2 if it can interpret machine language L2. 3.List the differences between hardwired control and micro programmed control? (APRIL/MAY 2008) Hardwired control is a control mechanism to generate control signals by using appropriate finite state machine (FSM). Microprogrammed control is a control mechanism to generate control signals by using a memory called control storage (CS), which contains the control signals. Although microprogrammed control seems to be advantageous to CISC machines, since CISC requires systematic development of sophisticated control signals, there is no intrinsic difference between these two control mechanism. Department of CSE- Mahalakshmi Engineering College Page 11

12 The pair of "microinstruction-register" and "control storage address register" can be regarded as a "state register" for the hardwired control. Note that the control storage can be regarded as a kind of combinational logic circuit. We can assign any 0, 1 values to each output corresponding to each address, which can be regarded as the input for a combinational logic circuit. This is a truth table. CISC also can be implemented by using hardwired control: In the above sense, the microprogrammed control is not always necessary to implement CISC machines. Hardwired control also can be used for implementing sophisticated CISC machines. The bases of this opinion are as follows: The same field configuration (state assignment) can be used for both of these two types of control. This is clear because of the above identification. We can use any large FSM,which has horizontal microcode like state assignment, since the delay for the hardwired control logic (FSM) does not matter at all so long as it is less than or equal to the delay for the data-path, which includes adders, shifters and so on, since the control logic circuit works in parallel with the data-path. The horizontal microcode like state assignment has became very easy to be implemented because of the spread of the hardware description language(hdl)s. In Verilog HDL, `define statements enable us to get perfect net-list for the large FSM in a very short time by using appropriate logic synthesizer. The "parameter" statement also can be used for the state assignment in Verilog HDL. CISC and RISC are the major two different types of ordinary SISD machines. Since the hardwired control have been historically faster, both of these two types of machines are implemented by using the hardwired control in the microcomputer design educational environment 4.Explain about Superscalar Operation.(AUC NOV 11, 12) A superscalar architecture is one in which several instructions can be initiated simultaneously and executed independently. Pipelining allows several instructions to be executed at the same time, but they have to be in different pipeline stages at a given moment. Superscalar architectures include all features of pipelining but, in addition, there can be several instructions executing simultaneously in the same pipeline stage. There are two typical approaches today, in order to improve performance: 1. Super pipelining 2. Superscalar. pipeline into sub stages and thus increasing the number of instructions which are supported by the pipeline at a given moment. By dividing each stage into two, the clock cycle period t will be reduced to the half, t/2; hence, at the maximum capacity, the pipeline produces a result every t/2 s. For a given architecture and the corresponding instruction set there is an optimal number of Department of CSE- Mahalakshmi Engineering College Page 12

13 pipeline stages; increasing the number of stages over this limit reduces the overall performance. A solution to further improve speed is the superscalar architecture. Superscalar Architectures Superscalar architectures allow several instructions to be issued and completed per clock cycle. A superscalar architecture consists of a number of pipelines that are working in parallel. Depending on the number and kind of parallel un its available, a certain number of instructions can be executed in parallel. In the following example a floating point and two integer operations can be issued and executed simultaneously, each unit is pipelined and can execute Department of CSE- Mahalakshmi Engineering College Page 13

14 Limitations on Parallel Execution several operations in different pipeline stages The situations which prevent instructions to be executed in parallel by a superscalar architecture are very similar to those which prevent an efficient execution on any pipelined architecture. The consequences of these situations on superscalar architectures are more severe than those on simple pipelines, because the potential of parallelism in super scalars is greater and, thus, a greater opportunity is lost. 1. Resource conflicts: - They occur if two or more instructions compete for the same resource (register,memory, functional unit) at the same time;they are similar to structural hazards discussed with pipelines. Introducing several parallel pipelined units, superscalar architecturestry to reduce a part of possible resource conflicts. 2. Control (procedural) dependency: - The presence of branches creates major problems in assuring an optimal parallelism. - If instructions are of variable length, they cannot be fetched and issued in parallel;an instruction has to be decoded in order to identify the following one and to fetch it superscalar techniques are efficiently applicable to RISCs, with fixed instruction length and format. 3. Data conflicts: - Data conflicts are produced by data dependencies between instructions in the program. Because superscalar architectures provide a great liberty in the order in which instructions can be issued and completed, data dependencies have to be considered with much attention. Data Dependencies Department of CSE- Mahalakshmi Engineering College Page 14

15 All instructions in the window of execution may begin execution, subject to data dependence (and resource) constraints. Three types of data dependencies can be identified: 1. True data dependency 2. Output dependency 3. Anti dependency-artificial dependencies True Data Dependency True data dependency exists when the output of one instruction is required as an input to a subsequent instruction: MUL R4,R3,R1 R4 R3 * R ADD R2,R4,R5 R2 R4 + R5 True data dependencies are intrinsic features of the user s program. They cannot be eliminated by compiler or hardware techniques. True data dependencies have to be detected and treated: the addition above cannot be executed before the result of the multiplication is available. Data Dependencies All instructions in the window of execution may begin execution, subject to data dependence (and resource) constraints. Three types of data dependencies can be identified: 1. True data dependency 2. Output dependency 3. Anti dependency Output Dependency An output dependency exists if two instructions are writing into the same location; if the second instruction writes before the first one, an error occurs: MUL R4,R3,R1 R4 R3 * R ADD R4,R2,R5 R4 R2 + R5 For the example on slide 12: Department of CSE- Mahalakshmi Engineering College Page 15

16 L2 move r3,r7 lw r8,(r3) add r3,r3,4 lw r9,(r3) ble r8,r9,l3 Anti dependency An anti dependency exists if an instruction uses allocation as an operand while a following one is writing into that location; if the first one is still using the location when the second one writes into it, an error occurs: MUL R4,R3,R1 R4 R3 * R ADD R3,R2,R5 R3 R2 + R5 For the example on slide 12: L2 move r3,r7 lw r8,(r3) add r3,r3,4 lw r9,(r3) ble r8,r9,l3 The Nature of Output Dependency and Anti dependency Output dependencies and anti dependencies are not intrinsic features of the executed program; they are not real data dependencies but storage conflicts. Output dependencies and anti dependencies are only the consequence of the manner in which the programmer or the compiler are using registers (or memory locations). They are produced by the competition of several instructions for the same register. In the previous examples the conflicts are produced only because: - the output dependency: R4 is used by both instructions to store the result; - the anti dependency: R3 is used by the second instruction to store the result; The examples could be written without dependencies by using additional registers: MUL R4,R3,R1 R4 R3 * R Department of CSE- Mahalakshmi Engineering College Page 16

ADD R7,R2,R5 R7 R2 + R5 And MUL R4, Register Renaming Output dependencies and anti dependencies can be treated similarly to true data dependencies as normal conflicts.

17 ADD R7,R2,R5 R7 R2 + R5 And MUL R4, Register Renaming Output dependencies and anti dependencies can be treated similarly to true data dependencies as normal conflicts. Such conflicts are solved by delaying the execution of a certain instruction until it can be executed. Parallelism could be improved by eliminating output dependencies and anti dependencies, which are not real data dependencies Output dependencies and anti dependencies can be eliminated by automatically allocating new registers to values, when such a dependency has been detected. This technique is called register renaming. The output dependency is eliminated by allocating, for example, R6 to the value R2+R5: MUL R4,R3,R1 R4 R3 * R ADD R4,R2,R5 R4 R2 + R5 (ADD R6,R2,R5 R6 R2 + R5) The same is true for the anti dependency below: MUL R4,R3,R1 R4 R3 * R ADD R3,R2,R5 R3 R2 + R5 (ADD R6,R2,R5 R6 R2 + R5) 5. Explain the performance of pipelining. A stall causes the pipeline performance to degrade the ideal performance. The ideal CPI on a pipelined machine is almost always 1. Hence, the pipelined CPI is Department of CSE- Mahalakshmi Engineering College Page 17

18 If we ignore the cycle time overhead of pipelining and assume the stages are all perfectly balanced, then the cycle time of the two machines are equal and If all instructions take the same number of cycles, which must also equal the number of pipeline stages ( the depth of the pipeline) then unpipelined CPI is equal to the depth of the pipeline, leading to If there are no pipeline stalls, this leads to the intuitive result that pipelining can improve performance by the depth of pipeline. 6. Explain in detail about Nano programming..(auc NOV 11) Nano programming Use a 2-level control storage organization Top level is a vertical format memory Output of the top level memory drives the address register of the bottom (nano-level) memory Nano memory uses the horizontal format Produces the actual control signal outputs. The advantage to this approach is significant saving in control memory size (bits) Disadvantage is more complexity and slower operation Nano programmed machine Department of CSE- Mahalakshmi Engineering College Page 18

19 Example: Suppose that a system is being designed with 200 control points and 2048 microinstructions Assume that only 256 different combinations of control points are ever used A single-level control memory would require 2048x200=409,600 storage bits A nano programmed system would use Micro store of size 2048x8=16k Nano store of size 256x200=51200 Total size = 67,584 storage bits Nano programming has been used in many CISC microprocessors Applications of Microprogramming Microprogramming application: emulation The use of a micro program on one machine to execute programs originally written to run on another (different!) machine By changing the microcode of a machine, you can make it execute software from another machine Commonly used in the past to permit new machines to continue to run old software» VAX had 2 modes Normal mode Emulation mode for a PDP-11 The Nanodata QM-1 machine was marketed with no native instruction set!» Universal emulation engine 6. Discuss various hazards that might arise in the pipeline. What are the remedies commonly adopted to overcome/minimize these hazards (AUC NOV 12) Hazard In computer architecture, a hazard is a potential problem that can happen in a pipelined processor. It refers to the possibility of erroneous computation when a CPU tries to simultaneously execute multiple instructions which exhibit data dependence. There are typically three types of hazards: data hazards, structural hazards, and branching hazards (control hazards). Instructions in a pipelined processor are performed in several stages, so that at any given time several instructions are being executed, and instructions may not be completed in the desired order. Department of CSE- Mahalakshmi Engineering College Page 19

20 A hazard occurs when two or more of these simultaneous (possibly out of order) instructions conflict. 1. Data hazards RAW - Read After Write WAR - Write After Read WAW - Write After Write 2.Structural hazards 3.Branch (control) hazards 4. Eliminating hazards Eliminating data hazards 5. Eliminating branch hazards Data hazards A major effect of pipelining is to change the relative timing of instructions by overlapping their execution. This introduces data and control hazards. Data hazards occur when the pipeline changes the order of read/write accesses to operands so that the order differs from the order seen by sequentially executing instructions on the un pipelined machine. Consider the pipelined execution of these instructions: All the instructions after the ADD use the result of the ADD instruction (in R1). The ADD instruction writes the value of R1 in the WB stage (shown black), and the SUB instruction reads the value during ID stage (ID sub ). This problem is called a data hazard. Unless precautions are taken to prevent it, the SUB instruction will read the wrong value and try to use it. Department of CSE- Mahalakshmi Engineering College Page 20

21 The AND instruction is also affected by this data hazard. The write of R1 does not complete until the end of cycle 5 (shown black). Thus, the AND instruction that reads the registers during cycle 4 (ID and ) will receive the wrong result. The OR instruction can be made to operate without incurring a hazard by a simple implementation technique. The technique is to perform register file reads in the second half of the cycle, and writes in the first half.because both WB for ADD and ID or for OR are performed in one cycle 5, the write to register file by ADD will perform in the first half of the cycle, and the read of registers by OR will perform in the second half of the cycle. The XOR instruction operates properly, because its register read occur in cycle 6 after the register write by ADD. the data hazards and consider the cases when stalls can not be eliminated. A hazard is created whenever there is a dependence between instructions, and they are close enough that the overlap caused by pipelining would change the order of access to an operand. Our example hazards have all been with register operands, but it is also possible to create a dependence by writing and reading the same memory location. In DLX pipeline, however, memory references are always kept in order, preventing this type of hazard from arising. All the data hazards discussed here involve registers within the CPU. By convention, the hazards are namedby the ordering in the program that must be preserved by the pipeline. Consider two instructions i and j, with i occurring before j. The possible data hazards are: RAW (read after write) - j tries to read a source before i writes it, so j incorrectly gets the old value. This is the most common type of hazard and the kind that we use forwarding to overcome. WAW (write after write) - j tries to write an operand before it is written by i. The writes end up being performed in the wrong order, leaving the value written by i rather than the value written by j in the destination. This hazard is present only in pipelines that write in more than one pipe stage or allow an instruction to proceed even when a previous instruction is stalled. The DLX integer pipeline writes a register only in WB and avoids this class of hazards. WAW hazards would be possible if we made the following two changes to the DLX pipeline: Move write back for an ALU operation into the MEM stage, since the data value is available by then. suppose that the data memory access took two pipe stages. Unless this hazard is avoided, execution of this sequence on this revised pipeline will leave the result of the first write (the LW) in R1, rather than the result of the ADD. Allowing writes in different pipe stages introduces other problems, since two instructions can try to write during the same clock cycle. The DLX FP pipeline, which has both writes in different stages and different pipeline lengths, will deal with both write conflicts and WAW hazards in detail. WAR - j tries to write a destination before it is read by i, so i incorrectly gets the new value. Department of CSE- Mahalakshmi Engineering College Page 21

22 This hazard occurs when there are some instructions that write results early in the instruction pipeline, and other instructions that read a source late in the pipeline. Because of the natural structure of a pipeline, which typically reads values before it writes results, such hazards are rare. Pipelines for complex instruction sets that support auto increment addressing and require operands to be read late in the pipeline could create a WAR hazards. The modified the DLX pipeline as in the above example and also read some operands late, such as the source value for a store instruction, a WAR hazard could occur. Here is the pipeline timing for such a potential hazard, highlighting the stage where the conflict occurs: If the SW reads R2 during the second half of its MEM2 stage and the Add writes R2 during the first half of its WB stage, the SW will incorrectly read and store the value produced by the ADD. RAR (read after read) Structural hazards A structural hazard occurs when a part of the processor's hardware is needed by two or more instructions at the same time. A structural hazard might occur, for instance, if a program were to execute a branch instruction followed by a computation instruction. Because they are executed in parallel, and because branching is typically slow (requiring a comparison, program counter-related computation, and writing to registers), it is quite possible (depending on architecture) that the computation instruction and the branch instruction will both require the ALU (arithmetic logic unit) at the same time. When a machine is pipelined, the overlapped execution of instructions requires pipelining of functional units and duplication of resources to allow all possible combinations of instructions in the pipeline. If some combination of instructions cannot be accommodated because of a resource conflict, the machine is said to have a structural hazard. Common instances of structural hazards arise when Some functional unit is not fully pipelined. Then a sequence of instructions using that un pipelined unit cannot proceed at the rate of one per clock cycle Example1:a machine may have only one register-file write port, but in some cases the pipeline might want to perform two writes in a clock cycle. To resolve this, we stall the pipeline for one clock cycle when a data-memory access occurs. The effect of the stall is actually to occupy the resources for that instruction slot. The following table shows how the stalls are actually implemented. Clock cycle number Instruction 1 assumed not to be data-memory reference (load or store), otherwise Instruction 3 cannot start execution for the same reason as above.to simplify the picture it is also commonly shown like this: Introducing stalls degrades performance There are two reasons for allowing structure hazards: To reduce cost. To reduce the latency of the unit. The shorter latency comes from the lack of pipeline registers that introduce overhead. Department of CSE- Mahalakshmi Engineering College Page 22

23 Branch (control) hazard Branching hazards (also known as control hazards) occur when the processor is told to branch - i.e., if a certain condition is true, then jump from one part of the instruction stream to another - not necessarily to the next instruction sequentially. In such a case, the processor cannot tell in advance whether it should process the next instruction (when it may instead have to move to a distant instruction). cache miss. A cache miss stalls all the instructions on pipeline both before and after the instruction causingthe miss. hazard in pipeline. Eliminating a hazard often requires that some instructions in the pipeline to be allowedto proceed while others are delayed. When the instruction is stalled, all the instructions issued later than the stal led instruction are also stalled. Instructions issued earlier than the stalled instruction must continue, since otherwise the hazard A hazard causes pipeline bubbles to be inserted. In case of structural hazards: Handling control hazards is very important VAX e.g.,emer and Clark report 39% of instr. change the PC Naive solution adds approx. 5 cycles every time Or, adds 2 to CPI or ~20% increase DLX e.g., H&P report 13% branches Naive solution adds 3 cycles per branch Or, 0.39 added to CPI or ~30% increase Eliminating hazards The task of removing data dependencies to the compiler, which can fill in an appropriate number of NOP instructions between dependent instructions to ensure correct operation, or re-order instructions where possible. Other methods include on-chip solutions such as: Score boarding method Tomasulo's method Bubbling the Pipeline Bubbling the pipeline (a technique also known as a pipeline break or pipeline stall) is a method for preventing data, structural, and branch hazards from occurring. As instructions are fetched, control logic determines whether a hazard could/will occur. If this is true, then the control logic inserts NOPs into the pipeline. Thus, before the next instruction (which would cause the hazard) is executed, the previous one will have had sufficient time to complete and prevent the hazard. If the number of NOPs is equal to the number of stages in the pipeline, the processor has been cleared of all instructions and can proceed free from hazards. This is called flushing the pipeline. All forms of stalling introduce a delay before the processor can resume execution. Eliminating data hazards:forwarding Forwarding involves feeding output data into a previous stage of the pipeline. For instance, let's say we want to write the value 3 to register 1, (which already contains a 6), and then add 7 to register 1 and store the result in register 2, i.e.: Department of CSE- Mahalakshmi Engineering College Page 23

24 Instruction 0: Register 1 = 6 Instruction 1: Register 1 = 3 Instruction 2: Register 2 = Register = 10 Following execution, register 2 should contain the value 10. However, if Instruction 1 (write 3 to register 1) does not completely exit the pipeline before Instruction 2 starts execution, it means that Register 1 does not contain the value 3 when Instruction 2 performs its addition. In such an event, Instruction 2 adds 7 to the old value of register 1 (6), and so register 2 would contain 13 instead, i.e: Instruction 0: Register 1 = 6 Instruction 1: Register 1 = 3 Instruction 2: Register 2 = Register = 13 This error occurs because Instruction 2 reads Register 1 before Instruction 1 has committed/stored the result of its write operation to Register 1. So when Instruction 2 is reading the contents of Register 1, register 1 still contains 6, not 3. Forwarding) helps correct such errors by depending on the fact that the output of Instruction 1 (which is 3) can be used by subsequent instructions before the value 3 is committed to/stored in Register 1. Forwarding is implemented by feeding back the output of an instruction into the previous stage(s) of the pipeline as soon as the output of that instruction is available. Forwarding applied to our example means that we do not wait to commit/store the output of Instruction 1 in Register 1 before making that output available to the subsequent instruction (in this case, Instruction 2). The effect is that Instruction 2 uses the correct (the more recent) value of Register1: the commit/store was made immediately and not pipelined. With forwarding enabled, the ID/EX [clarification needed] stage of the pipeline now has two inputs: the value read from the register specified (in this example, the value 6 from Register 1), and the new value of Register 1 (in this example, this value is 3) which is sent from the next stage (EX/MEM) [clarificationneeded]. Additional control logic is used to determine which input to use. Forwarding Unit Department of CSE- Mahalakshmi Engineering College Page 24

25 Load use stall. With forwarding enabled, the ID/EX [clarification needed] stage of the pipeline now has two inputs: the value read from the register specified (in this example, the value 6 from Register 1), and the new value of Register 1 (in this example, this value is 3) which is sent from the next stage (EX/MEM) [clarificationneeded]. Additional control logic is used to determine which input to use Department of CSE- Mahalakshmi Engineering College Page 25

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation