Department of CSE- Mahalakshmi Engineering College Page 1

Size: px
Start display at page:

Download "Department of CSE- Mahalakshmi Engineering College Page 1"

Transcription

1 SUB NAME: COMPUTER ARCHITECTURE AND ORGANIZATION BRANCH: ECE SUB CODE: EC2303 YEAR/SEM: III / V UNIT-III-CONTROL DESIGN PART-A 1. What are the advantages and disadvantages of hard wired and micro programmed control? (AUCNOV 07) Hardwire control Advantages: Hardwire control works fast. Combinational circuit generates the control signal based on input signal status. Disadvantages: The control unit design becomes very complex. Microprogrammed control Advantages: The design of microprogram is less complex. The microprogram is flexible. Disadvantages: A microprogrammed CPU is slow. Expensive. 2. What is micro program? (AUCNOV 09) Microinstruction also explicitly are implicitly specifies the next microinstruction to be used there by proiding the necessary information for micro-operation sequencing. A set of related microinstruction forms a microprogram. 3. State the difference between hardwired control and micro programmed control unit. (AUC MAY 07,APR 11) S.No Hardwired control Micro programmed control 1 Implemented in hardware Implemented in software 2 Digital circuits generate the control signals. The control signals are stored as bit pattern in a ROM 3 It is conventional design technique It is modern design technique. Department of CSE- Mahalakshmi Engineering College Page 1

2 4. What is hardwired control? A hardwire control unit consists of combinational circuit to generate various control signals. This scheme is used to minimize the cost while achieving for higher efficiency in terms of operation speed. 5. What is micro instruction and what factor should determine the length of the microinstruction s? (AUC APR 11) Every instruction in the CPU is implemented bu a sequence of one or more sets of concurrent micro- operations.each micro operations is associated with a specific set of control lines when activated, causes micro-operation to takes place. The size of the microinstruction should be minimum so that the size of the control memory required to store microinstruction is less. 6. What is the drawback of assigning one bit to each control signal?(auc NOV 12) The serious drawback for assigning individual bit to each control signal results in long microinstructions,because the no of required signal usually large. 7. What is called static branch prediction and dynamic branch prediction.(auc NOV 12) A branch prediction is same every time a given instruction is executed. The prediction decision may change depending on exeution history is called dynamic branch prediction. 8. Define Control word (CW) & Control store. The micro routines for all instructions in the instruction set of a computer are stored in a special memory is called the control store. A control word is a word whose individual bits represent the various control signals. 9. Draw the structure of hard wired control unit. Department of CSE- Mahalakshmi Engineering College Page 2

3 10. Draw the structure of micro programmed control unit. 11.What is a processor control clock? All the operations and data transfers with in the processor take place within time periods defined by a clock cycle called as a processor clock. 12.What is meant by Nano programming? It reduces the control bits to require to interpret an instruction set. 13.What is the purpose of WMFC signal? WMFC is a control signal that causes the processors control circuitry to wait for the arrival of the MFC signal. 14.What is micro routine? Sequences of control words corresponding to the control sequence of machine instruction constitute the microroutine. 15. What is pipelining? Pipelining is a process of executing machine instructions concurrently.it s also called as assembly line operation. 16. Define hazard. Any condition that causes the pipeline to stall is called as hazard. Department of CSE- Mahalakshmi Engineering College Page 3

4 PART-B 1.Explain the Organization of Hardwired control in detail. (AUC NOV 06) For n sequences of steps, assume that the states are changed in each step from one state ts, I0 to Ix 1s of steps, assume that the states are changed in each step from one state to the next from S0 to Sn 1 At each state, the set of outputs, Oi,0 to Oi, m 1,depend on the current states and inputs.a set of outputs for the given set of inputs,assuming that the control unit is generatingstates by its circuit, which is based on Moorestate machine concept Moore state machine concept is that each output signal Oi,0 to Oi, m 1 depends on the current state onlyx 1 Hardwired Control Unit Outputs O i, 0 to O i, m 1 in sequentially generated states The outputs Oi,0 to Oi, m 1 in case of each state 3 types of control signals 1. Select function signals, f1, f2,..,fy 2, fy 1 2. Select storage units, r1, r2,..,rz 2, rz 1 3. Select data route signals, d1, d2,..,dr 2, dr 1 Department of CSE- Mahalakshmi Engineering College Page 4

5 Moore state machine ith set of outputs forgiven set of inputs in the ith state Hardwired control unit organization using a sequence counter from 0 to n 1 Encoder to generate states with outputs Oi, 0to Oi, m 1 in sequentially generated stateshardwired Control Unit Organization Department of CSE- Mahalakshmi Engineering College Page 5

6 An encoder takes input from a sequence counter, instruction decoder, condition/status flags and external inputs External inputs can be from an external interrupting source and an external device requesting access to external buses Condition/status flags are as per conditions/ status flags set in earlier instructions Control Unit Organization The encoder sends a reset signal after the end of an instruction and a stop signal to the sequencer after the last sequence The encoder also sends count start signal to let the clock increment the counter during processing of an instruction Three sets of control signals that are outputs in various states. Control Unit Outputs (1) Select function signals, f1, f2,..,fy 2, fy 1 (2) select storage units, r1, r2,..,rz 2, rz 1 (3) Select data route signals, d1, d2,..,dr 2, dr 1 Operations by the control unit using encoder Output control signals in first two steps (i) Step j: PC MAR. (ii) Step j + 1: PC PC + 4 for 32-bits memory word alignments The sequence counter Resets when an instruction transfers to ID after decoding and starts on encoder activating count signal Let c0, c1,..cn 1 be the sequence counter output from the instance activation of the count output of encoder r1 to r12 storing unit control signals r1. Storing unit PC output control r2. Storing unit PC input control r3. Storing unit MAR output control r4. Storing unit MAR input control r5. Constant 4 storing unit output control Department of CSE- Mahalakshmi Engineering College Page 6

7 r6. Constant 4 storing unit input control r1 to r12 storing unit control signals r7. Input for arithmetic unit X output control r8. Input for arithmetic unit X input control r9. Input for arithmetic unit Y output control r10. Input for arithmetic unit Y input control r11. Arithmetic unit output Z input control r12. Z output control d1 to d4 the data route select control signals d1. Data route internal bus control d2. Data route external address bus output control d3. Data route external data bus output control d4. Data route external data bus input control Step j: PC MAR implements by a control signal r4 c0.f1.d1.r1. Step j + 1: PC PC + 4 implements in the next sequence by a control signal r2 c1.f1.d1.r6.r5.r8.r7.r1. r10.r9.f2.r11.r12. 2.Explain the Organization of Micro programmed control unit (AUC NOV 07, 06,MAY 06,NOV 06,NOV 11, 12). Microprogramming is an orderly method of designing the control unit of a conventional computer The term microprogramming is based on the analogy between sequence of transfer required to execute a machine instruction and the sequence of individual instructions in conventional user program. Each step is called microinstruction and complete set of steps required to process a machine instruction is called the micro program. Department of CSE- Mahalakshmi Engineering College Page 7

8 Department of CSE- Mahalakshmi Engineering College Page 8

9 Microcode Execution: 1.Op-code is decoded. 2. Microinstructions are retrieved from control memory (control address register and the decoder serve as the address register and selection mechanism or control unit). 3. The control address register locates the microinstruction to be retrieved from control memory. 4. The microinstruction register holds the retrieved microinstruction - micro opcode and address of the next microinstruction in the control memory 5. Current microinstruction is executed. 6. The address of the next microinstruction is entered into the control memory to retrieve the next microinstruction. 7. If all microinstructions were executed, then store next op-code of conventional instruction in the control address register, if not, execute remaining microinstruction. 8. Conditional jumps are implemented by letting the states of some conditional flip-flops modify the address of the next microinstruction to be retrieved. VARIATIONS IN WILKES' IMPLEMENTATION 1. Microinstruction address selection. 2. Instruction set strategies (control word organization). 1) The control address register is modified to function as a counter when the next microinstruction follows the current address and as a register when a branch is required (similar to Wilkes' scheme). - control address acts as a counter or a register 2) Addition of a register to hold the address of the next microinstruction (μpc). 3) Control memory may be organized in two storage matrices each with its own decoding tree, thus providing faster operation at lower cost, since the size of the decoder is reduced. Department of CSE- Mahalakshmi Engineering College Page 9

10 Department of CSE- Mahalakshmi Engineering College Page 10

11 Next address of microcode execution may be specified by: 1) Address field of the current microinstruction 2) μpc (microprogam counter) 3) Address from address control store Microinstruction formats: 1) No encoding 2) Some encoding 3) Complete encoding ADVANTAGES & APPLICATIONS OF MICROPROGRAMMING 1) THE SYSTEMATIZATION OF CONTROL 2) IMPROVEMENT IN PERFORMANCE a) a high degree of parallelism in data paths e.g., multiple bit microinstructions are performed in one cycle b) a high degree of decision logic (in table search and sorting routines) 3) COMPUTER-SERIES COMPATIBILITY Compatibility of instruction sets between smaller and larger machines of a series, e.g., Intel286, 386, Pentium, IBM Systems/309x, Motorola series 4) EMULATION Emulation is the combined software/hardware interpretation of the machine instruction of one machine by another. Target s machine architecture is mapped onto the host machine. EMULATOR - a set of microprograms that interpret a particular instruction set or language L1. Computer C1 emulates computer C2 if it can interpret machine language L2. 3.List the differences between hardwired control and micro programmed control? (APRIL/MAY 2008) Hardwired control is a control mechanism to generate control signals by using appropriate finite state machine (FSM). Microprogrammed control is a control mechanism to generate control signals by using a memory called control storage (CS), which contains the control signals. Although microprogrammed control seems to be advantageous to CISC machines, since CISC requires systematic development of sophisticated control signals, there is no intrinsic difference between these two control mechanism. Department of CSE- Mahalakshmi Engineering College Page 11

12 The pair of "microinstruction-register" and "control storage address register" can be regarded as a "state register" for the hardwired control. Note that the control storage can be regarded as a kind of combinational logic circuit. We can assign any 0, 1 values to each output corresponding to each address, which can be regarded as the input for a combinational logic circuit. This is a truth table. CISC also can be implemented by using hardwired control: In the above sense, the microprogrammed control is not always necessary to implement CISC machines. Hardwired control also can be used for implementing sophisticated CISC machines. The bases of this opinion are as follows: The same field configuration (state assignment) can be used for both of these two types of control. This is clear because of the above identification. We can use any large FSM,which has horizontal microcode like state assignment, since the delay for the hardwired control logic (FSM) does not matter at all so long as it is less than or equal to the delay for the data-path, which includes adders, shifters and so on, since the control logic circuit works in parallel with the data-path. The horizontal microcode like state assignment has became very easy to be implemented because of the spread of the hardware description language(hdl)s. In Verilog HDL, `define statements enable us to get perfect net-list for the large FSM in a very short time by using appropriate logic synthesizer. The "parameter" statement also can be used for the state assignment in Verilog HDL. CISC and RISC are the major two different types of ordinary SISD machines. Since the hardwired control have been historically faster, both of these two types of machines are implemented by using the hardwired control in the microcomputer design educational environment 4.Explain about Superscalar Operation.(AUC NOV 11, 12) A superscalar architecture is one in which several instructions can be initiated simultaneously and executed independently. Pipelining allows several instructions to be executed at the same time, but they have to be in different pipeline stages at a given moment. Superscalar architectures include all features of pipelining but, in addition, there can be several instructions executing simultaneously in the same pipeline stage. There are two typical approaches today, in order to improve performance: 1. Super pipelining 2. Superscalar. pipeline into sub stages and thus increasing the number of instructions which are supported by the pipeline at a given moment. By dividing each stage into two, the clock cycle period t will be reduced to the half, t/2; hence, at the maximum capacity, the pipeline produces a result every t/2 s. For a given architecture and the corresponding instruction set there is an optimal number of Department of CSE- Mahalakshmi Engineering College Page 12

13 pipeline stages; increasing the number of stages over this limit reduces the overall performance. A solution to further improve speed is the superscalar architecture. Superscalar Architectures Superscalar architectures allow several instructions to be issued and completed per clock cycle. A superscalar architecture consists of a number of pipelines that are working in parallel. Depending on the number and kind of parallel un its available, a certain number of instructions can be executed in parallel. In the following example a floating point and two integer operations can be issued and executed simultaneously, each unit is pipelined and can execute Department of CSE- Mahalakshmi Engineering College Page 13

14 Limitations on Parallel Execution several operations in different pipeline stages The situations which prevent instructions to be executed in parallel by a superscalar architecture are very similar to those which prevent an efficient execution on any pipelined architecture. The consequences of these situations on superscalar architectures are more severe than those on simple pipelines, because the potential of parallelism in super scalars is greater and, thus, a greater opportunity is lost. 1. Resource conflicts: - They occur if two or more instructions compete for the same resource (register,memory, functional unit) at the same time;they are similar to structural hazards discussed with pipelines. Introducing several parallel pipelined units, superscalar architecturestry to reduce a part of possible resource conflicts. 2. Control (procedural) dependency: - The presence of branches creates major problems in assuring an optimal parallelism. - If instructions are of variable length, they cannot be fetched and issued in parallel;an instruction has to be decoded in order to identify the following one and to fetch it superscalar techniques are efficiently applicable to RISCs, with fixed instruction length and format. 3. Data conflicts: - Data conflicts are produced by data dependencies between instructions in the program. Because superscalar architectures provide a great liberty in the order in which instructions can be issued and completed, data dependencies have to be considered with much attention. Data Dependencies Department of CSE- Mahalakshmi Engineering College Page 14

15 All instructions in the window of execution may begin execution, subject to data dependence (and resource) constraints. Three types of data dependencies can be identified: 1. True data dependency 2. Output dependency 3. Anti dependency-artificial dependencies True Data Dependency True data dependency exists when the output of one instruction is required as an input to a subsequent instruction: MUL R4,R3,R1 R4 R3 * R ADD R2,R4,R5 R2 R4 + R5 True data dependencies are intrinsic features of the user s program. They cannot be eliminated by compiler or hardware techniques. True data dependencies have to be detected and treated: the addition above cannot be executed before the result of the multiplication is available. Data Dependencies All instructions in the window of execution may begin execution, subject to data dependence (and resource) constraints. Three types of data dependencies can be identified: 1. True data dependency 2. Output dependency 3. Anti dependency Output Dependency An output dependency exists if two instructions are writing into the same location; if the second instruction writes before the first one, an error occurs: MUL R4,R3,R1 R4 R3 * R ADD R4,R2,R5 R4 R2 + R5 For the example on slide 12: Department of CSE- Mahalakshmi Engineering College Page 15

16 L2 move r3,r7 lw r8,(r3) add r3,r3,4 lw r9,(r3) ble r8,r9,l3 Anti dependency An anti dependency exists if an instruction uses allocation as an operand while a following one is writing into that location; if the first one is still using the location when the second one writes into it, an error occurs: MUL R4,R3,R1 R4 R3 * R ADD R3,R2,R5 R3 R2 + R5 For the example on slide 12: L2 move r3,r7 lw r8,(r3) add r3,r3,4 lw r9,(r3) ble r8,r9,l3 The Nature of Output Dependency and Anti dependency Output dependencies and anti dependencies are not intrinsic features of the executed program; they are not real data dependencies but storage conflicts. Output dependencies and anti dependencies are only the consequence of the manner in which the programmer or the compiler are using registers (or memory locations). They are produced by the competition of several instructions for the same register. In the previous examples the conflicts are produced only because: - the output dependency: R4 is used by both instructions to store the result; - the anti dependency: R3 is used by the second instruction to store the result; The examples could be written without dependencies by using additional registers: MUL R4,R3,R1 R4 R3 * R Department of CSE- Mahalakshmi Engineering College Page 16

17 ADD R7,R2,R5 R7 R2 + R5 And MUL R4, Register Renaming Output dependencies and anti dependencies can be treated similarly to true data dependencies as normal conflicts. Such conflicts are solved by delaying the execution of a certain instruction until it can be executed. Parallelism could be improved by eliminating output dependencies and anti dependencies, which are not real data dependencies Output dependencies and anti dependencies can be eliminated by automatically allocating new registers to values, when such a dependency has been detected. This technique is called register renaming. The output dependency is eliminated by allocating, for example, R6 to the value R2+R5: MUL R4,R3,R1 R4 R3 * R ADD R4,R2,R5 R4 R2 + R5 (ADD R6,R2,R5 R6 R2 + R5) The same is true for the anti dependency below: MUL R4,R3,R1 R4 R3 * R ADD R3,R2,R5 R3 R2 + R5 (ADD R6,R2,R5 R6 R2 + R5) 5. Explain the performance of pipelining. A stall causes the pipeline performance to degrade the ideal performance. The ideal CPI on a pipelined machine is almost always 1. Hence, the pipelined CPI is Department of CSE- Mahalakshmi Engineering College Page 17

18 If we ignore the cycle time overhead of pipelining and assume the stages are all perfectly balanced, then the cycle time of the two machines are equal and If all instructions take the same number of cycles, which must also equal the number of pipeline stages ( the depth of the pipeline) then unpipelined CPI is equal to the depth of the pipeline, leading to If there are no pipeline stalls, this leads to the intuitive result that pipelining can improve performance by the depth of pipeline. 6. Explain in detail about Nano programming..(auc NOV 11) Nano programming Use a 2-level control storage organization Top level is a vertical format memory Output of the top level memory drives the address register of the bottom (nano-level) memory Nano memory uses the horizontal format Produces the actual control signal outputs. The advantage to this approach is significant saving in control memory size (bits) Disadvantage is more complexity and slower operation Nano programmed machine Department of CSE- Mahalakshmi Engineering College Page 18

19 Example: Suppose that a system is being designed with 200 control points and 2048 microinstructions Assume that only 256 different combinations of control points are ever used A single-level control memory would require 2048x200=409,600 storage bits A nano programmed system would use Micro store of size 2048x8=16k Nano store of size 256x200=51200 Total size = 67,584 storage bits Nano programming has been used in many CISC microprocessors Applications of Microprogramming Microprogramming application: emulation The use of a micro program on one machine to execute programs originally written to run on another (different!) machine By changing the microcode of a machine, you can make it execute software from another machine Commonly used in the past to permit new machines to continue to run old software» VAX had 2 modes Normal mode Emulation mode for a PDP-11 The Nanodata QM-1 machine was marketed with no native instruction set!» Universal emulation engine 6. Discuss various hazards that might arise in the pipeline. What are the remedies commonly adopted to overcome/minimize these hazards (AUC NOV 12) Hazard In computer architecture, a hazard is a potential problem that can happen in a pipelined processor. It refers to the possibility of erroneous computation when a CPU tries to simultaneously execute multiple instructions which exhibit data dependence. There are typically three types of hazards: data hazards, structural hazards, and branching hazards (control hazards). Instructions in a pipelined processor are performed in several stages, so that at any given time several instructions are being executed, and instructions may not be completed in the desired order. Department of CSE- Mahalakshmi Engineering College Page 19

20 A hazard occurs when two or more of these simultaneous (possibly out of order) instructions conflict. 1. Data hazards RAW - Read After Write WAR - Write After Read WAW - Write After Write 2.Structural hazards 3.Branch (control) hazards 4. Eliminating hazards Eliminating data hazards 5. Eliminating branch hazards Data hazards A major effect of pipelining is to change the relative timing of instructions by overlapping their execution. This introduces data and control hazards. Data hazards occur when the pipeline changes the order of read/write accesses to operands so that the order differs from the order seen by sequentially executing instructions on the un pipelined machine. Consider the pipelined execution of these instructions: All the instructions after the ADD use the result of the ADD instruction (in R1). The ADD instruction writes the value of R1 in the WB stage (shown black), and the SUB instruction reads the value during ID stage (ID sub ). This problem is called a data hazard. Unless precautions are taken to prevent it, the SUB instruction will read the wrong value and try to use it. Department of CSE- Mahalakshmi Engineering College Page 20

21 The AND instruction is also affected by this data hazard. The write of R1 does not complete until the end of cycle 5 (shown black). Thus, the AND instruction that reads the registers during cycle 4 (ID and ) will receive the wrong result. The OR instruction can be made to operate without incurring a hazard by a simple implementation technique. The technique is to perform register file reads in the second half of the cycle, and writes in the first half.because both WB for ADD and ID or for OR are performed in one cycle 5, the write to register file by ADD will perform in the first half of the cycle, and the read of registers by OR will perform in the second half of the cycle. The XOR instruction operates properly, because its register read occur in cycle 6 after the register write by ADD. the data hazards and consider the cases when stalls can not be eliminated. A hazard is created whenever there is a dependence between instructions, and they are close enough that the overlap caused by pipelining would change the order of access to an operand. Our example hazards have all been with register operands, but it is also possible to create a dependence by writing and reading the same memory location. In DLX pipeline, however, memory references are always kept in order, preventing this type of hazard from arising. All the data hazards discussed here involve registers within the CPU. By convention, the hazards are namedby the ordering in the program that must be preserved by the pipeline. Consider two instructions i and j, with i occurring before j. The possible data hazards are: RAW (read after write) - j tries to read a source before i writes it, so j incorrectly gets the old value. This is the most common type of hazard and the kind that we use forwarding to overcome. WAW (write after write) - j tries to write an operand before it is written by i. The writes end up being performed in the wrong order, leaving the value written by i rather than the value written by j in the destination. This hazard is present only in pipelines that write in more than one pipe stage or allow an instruction to proceed even when a previous instruction is stalled. The DLX integer pipeline writes a register only in WB and avoids this class of hazards. WAW hazards would be possible if we made the following two changes to the DLX pipeline: Move write back for an ALU operation into the MEM stage, since the data value is available by then. suppose that the data memory access took two pipe stages. Unless this hazard is avoided, execution of this sequence on this revised pipeline will leave the result of the first write (the LW) in R1, rather than the result of the ADD. Allowing writes in different pipe stages introduces other problems, since two instructions can try to write during the same clock cycle. The DLX FP pipeline, which has both writes in different stages and different pipeline lengths, will deal with both write conflicts and WAW hazards in detail. WAR - j tries to write a destination before it is read by i, so i incorrectly gets the new value. Department of CSE- Mahalakshmi Engineering College Page 21

22 This hazard occurs when there are some instructions that write results early in the instruction pipeline, and other instructions that read a source late in the pipeline. Because of the natural structure of a pipeline, which typically reads values before it writes results, such hazards are rare. Pipelines for complex instruction sets that support auto increment addressing and require operands to be read late in the pipeline could create a WAR hazards. The modified the DLX pipeline as in the above example and also read some operands late, such as the source value for a store instruction, a WAR hazard could occur. Here is the pipeline timing for such a potential hazard, highlighting the stage where the conflict occurs: If the SW reads R2 during the second half of its MEM2 stage and the Add writes R2 during the first half of its WB stage, the SW will incorrectly read and store the value produced by the ADD. RAR (read after read) Structural hazards A structural hazard occurs when a part of the processor's hardware is needed by two or more instructions at the same time. A structural hazard might occur, for instance, if a program were to execute a branch instruction followed by a computation instruction. Because they are executed in parallel, and because branching is typically slow (requiring a comparison, program counter-related computation, and writing to registers), it is quite possible (depending on architecture) that the computation instruction and the branch instruction will both require the ALU (arithmetic logic unit) at the same time. When a machine is pipelined, the overlapped execution of instructions requires pipelining of functional units and duplication of resources to allow all possible combinations of instructions in the pipeline. If some combination of instructions cannot be accommodated because of a resource conflict, the machine is said to have a structural hazard. Common instances of structural hazards arise when Some functional unit is not fully pipelined. Then a sequence of instructions using that un pipelined unit cannot proceed at the rate of one per clock cycle Example1:a machine may have only one register-file write port, but in some cases the pipeline might want to perform two writes in a clock cycle. To resolve this, we stall the pipeline for one clock cycle when a data-memory access occurs. The effect of the stall is actually to occupy the resources for that instruction slot. The following table shows how the stalls are actually implemented. Clock cycle number Instruction 1 assumed not to be data-memory reference (load or store), otherwise Instruction 3 cannot start execution for the same reason as above.to simplify the picture it is also commonly shown like this: Introducing stalls degrades performance There are two reasons for allowing structure hazards: To reduce cost. To reduce the latency of the unit. The shorter latency comes from the lack of pipeline registers that introduce overhead. Department of CSE- Mahalakshmi Engineering College Page 22

23 Branch (control) hazard Branching hazards (also known as control hazards) occur when the processor is told to branch - i.e., if a certain condition is true, then jump from one part of the instruction stream to another - not necessarily to the next instruction sequentially. In such a case, the processor cannot tell in advance whether it should process the next instruction (when it may instead have to move to a distant instruction). cache miss. A cache miss stalls all the instructions on pipeline both before and after the instruction causingthe miss. hazard in pipeline. Eliminating a hazard often requires that some instructions in the pipeline to be allowedto proceed while others are delayed. When the instruction is stalled, all the instructions issued later than the stal led instruction are also stalled. Instructions issued earlier than the stalled instruction must continue, since otherwise the hazard A hazard causes pipeline bubbles to be inserted. In case of structural hazards: Handling control hazards is very important VAX e.g.,emer and Clark report 39% of instr. change the PC Naive solution adds approx. 5 cycles every time Or, adds 2 to CPI or ~20% increase DLX e.g., H&P report 13% branches Naive solution adds 3 cycles per branch Or, 0.39 added to CPI or ~30% increase Eliminating hazards The task of removing data dependencies to the compiler, which can fill in an appropriate number of NOP instructions between dependent instructions to ensure correct operation, or re-order instructions where possible. Other methods include on-chip solutions such as: Score boarding method Tomasulo's method Bubbling the Pipeline Bubbling the pipeline (a technique also known as a pipeline break or pipeline stall) is a method for preventing data, structural, and branch hazards from occurring. As instructions are fetched, control logic determines whether a hazard could/will occur. If this is true, then the control logic inserts NOPs into the pipeline. Thus, before the next instruction (which would cause the hazard) is executed, the previous one will have had sufficient time to complete and prevent the hazard. If the number of NOPs is equal to the number of stages in the pipeline, the processor has been cleared of all instructions and can proceed free from hazards. This is called flushing the pipeline. All forms of stalling introduce a delay before the processor can resume execution. Eliminating data hazards:forwarding Forwarding involves feeding output data into a previous stage of the pipeline. For instance, let's say we want to write the value 3 to register 1, (which already contains a 6), and then add 7 to register 1 and store the result in register 2, i.e.: Department of CSE- Mahalakshmi Engineering College Page 23

24 Instruction 0: Register 1 = 6 Instruction 1: Register 1 = 3 Instruction 2: Register 2 = Register = 10 Following execution, register 2 should contain the value 10. However, if Instruction 1 (write 3 to register 1) does not completely exit the pipeline before Instruction 2 starts execution, it means that Register 1 does not contain the value 3 when Instruction 2 performs its addition. In such an event, Instruction 2 adds 7 to the old value of register 1 (6), and so register 2 would contain 13 instead, i.e: Instruction 0: Register 1 = 6 Instruction 1: Register 1 = 3 Instruction 2: Register 2 = Register = 13 This error occurs because Instruction 2 reads Register 1 before Instruction 1 has committed/stored the result of its write operation to Register 1. So when Instruction 2 is reading the contents of Register 1, register 1 still contains 6, not 3. Forwarding) helps correct such errors by depending on the fact that the output of Instruction 1 (which is 3) can be used by subsequent instructions before the value 3 is committed to/stored in Register 1. Forwarding is implemented by feeding back the output of an instruction into the previous stage(s) of the pipeline as soon as the output of that instruction is available. Forwarding applied to our example means that we do not wait to commit/store the output of Instruction 1 in Register 1 before making that output available to the subsequent instruction (in this case, Instruction 2). The effect is that Instruction 2 uses the correct (the more recent) value of Register1: the commit/store was made immediately and not pipelined. With forwarding enabled, the ID/EX [clarification needed] stage of the pipeline now has two inputs: the value read from the register specified (in this example, the value 6 from Register 1), and the new value of Register 1 (in this example, this value is 3) which is sent from the next stage (EX/MEM) [clarificationneeded]. Additional control logic is used to determine which input to use. Forwarding Unit Department of CSE- Mahalakshmi Engineering College Page 24

25 Load use stall. With forwarding enabled, the ID/EX [clarification needed] stage of the pipeline now has two inputs: the value read from the register specified (in this example, the value 6 from Register 1), and the new value of Register 1 (in this example, this value is 3) which is sent from the next stage (EX/MEM) [clarificationneeded]. Additional control logic is used to determine which input to use Department of CSE- Mahalakshmi Engineering College Page 25

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017 Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation

More information

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

What is Pipelining? RISC remainder (our assumptions)

What is Pipelining? RISC remainder (our assumptions) What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr

Ti Parallel Computing PIPELINING. Michał Roziecki, Tomáš Cipr Ti5317000 Parallel Computing PIPELINING Michał Roziecki, Tomáš Cipr 2005-2006 Introduction to pipelining What is this What is pipelining? Pipelining is an implementation technique in which multiple instructions

More information

Basic Processing Unit: Some Fundamental Concepts, Execution of a. Complete Instruction, Multiple Bus Organization, Hard-wired Control,

Basic Processing Unit: Some Fundamental Concepts, Execution of a. Complete Instruction, Multiple Bus Organization, Hard-wired Control, UNIT - 7 Basic Processing Unit: Some Fundamental Concepts, Execution of a Complete Instruction, Multiple Bus Organization, Hard-wired Control, Microprogrammed Control Page 178 UNIT - 7 BASIC PROCESSING

More information

Instruction Pipelining Review

Instruction Pipelining Review Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number

More information

COSC4201 Pipelining. Prof. Mokhtar Aboelaze York University

COSC4201 Pipelining. Prof. Mokhtar Aboelaze York University COSC4201 Pipelining Prof. Mokhtar Aboelaze York University 1 Instructions: Fetch Every instruction could be executed in 5 cycles, these 5 cycles are (MIPS like machine). Instruction fetch IR Mem[PC] NPC

More information

COSC 6385 Computer Architecture - Pipelining

COSC 6385 Computer Architecture - Pipelining COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage

More information

Computer Architecture

Computer Architecture Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in

More information

Pipelining: Hazards Ver. Jan 14, 2014

Pipelining: Hazards Ver. Jan 14, 2014 POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? Pipelining: Hazards Ver. Jan 14, 2014 Marco D. Santambrogio: marco.santambrogio@polimi.it Simone Campanoni:

More information

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Thoai Nam Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy & David a Patterson,

More information

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline Instruction Pipelining Review: MIPS In-Order Single-Issue Integer Pipeline Performance of Pipelines with Stalls Pipeline Hazards Structural hazards Data hazards Minimizing Data hazard Stalls by Forwarding

More information

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:

More information

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building

More information

Pipeline Review. Review

Pipeline Review. Review Pipeline Review Review Covered in EECS2021 (was CSE2021) Just a reminder of pipeline and hazards If you need more details, review 2021 materials 1 The basic MIPS Processor Pipeline 2 Performance of pipelining

More information

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Pipeline Thoai Nam Outline Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome Reference: Computer Architecture: A Quantitative Approach, John L Hennessy

More information

Pipelining. Maurizio Palesi

Pipelining. Maurizio Palesi * Pipelining * Adapted from David A. Patterson s CS252 lecture slides, http://www.cs.berkeley/~pattrsn/252s98/index.html Copyright 1998 UCB 1 References John L. Hennessy and David A. Patterson, Computer

More information

Superscalar Processors

Superscalar Processors Superscalar Processors Superscalar Processor Multiple Independent Instruction Pipelines; each with multiple stages Instruction-Level Parallelism determine dependencies between nearby instructions o input

More information

Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring

More information

ECEC 355: Pipelining

ECEC 355: Pipelining ECEC 355: Pipelining November 8, 2007 What is Pipelining Pipelining is an implementation technique whereby multiple instructions are overlapped in execution. A pipeline is similar in concept to an assembly

More information

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 18 Dynamic Instruction Scheduling with Branch Prediction

More information

Instruction Pipelining

Instruction Pipelining Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages

More information

Instruction Pipelining

Instruction Pipelining Instruction Pipelining Simplest form is a 3-stage linear pipeline New instruction fetched each clock cycle Instruction finished each clock cycle Maximal speedup = 3 achieved if and only if all pipe stages

More information

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard.

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard. COMP 212 Computer Organization & Architecture Pipeline Re-Cap Pipeline is ILP -Instruction Level Parallelism COMP 212 Fall 2008 Lecture 12 RISC & Superscalar Divide instruction cycles into stages, overlapped

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

Processing Unit CS206T

Processing Unit CS206T Processing Unit CS206T Microprocessors The density of elements on processor chips continued to rise More and more elements were placed on each chip so that fewer and fewer chips were needed to construct

More information

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number

More information

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring

More information

There are different characteristics for exceptions. They are as follows:

There are different characteristics for exceptions. They are as follows: e-pg PATHSHALA- Computer Science Computer Architecture Module 15 Exception handling and floating point pipelines The objectives of this module are to discuss about exceptions and look at how the MIPS architecture

More information

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Science Cases that affect instruction execution semantics

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Pipelining 11142011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Review I/O Chapter 5 Overview Pipelining Pipelining

More information

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and

More information

Introduction to CPU Design

Introduction to CPU Design ١ Introduction to CPU Design Computer Organization & Assembly Language Programming Dr Adnan Gutub aagutub at uqu.edu.sa [Adapted from slides of Dr. Kip Irvine: Assembly Language for Intel-Based Computers]

More information

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4 The Processor 1. Chapter 4A. The Processor Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware

More information

Digital System Design Using Verilog. - Processing Unit Design

Digital System Design Using Verilog. - Processing Unit Design Digital System Design Using Verilog - Processing Unit Design 1.1 CPU BASICS A typical CPU has three major components: (1) Register set, (2) Arithmetic logic unit (ALU), and (3) Control unit (CU) The register

More information

More advanced CPUs. August 4, Howard Huang 1

More advanced CPUs. August 4, Howard Huang 1 More advanced CPUs In the last two weeks we presented the design of a basic processor. The datapath performs operations on register and memory data. A control unit translates program instructions into

More information

MARTHANDAM COLLEGE OF ENGINEERING AND TECHNOLOGY DEPARTMENT OF INFORMATION TECHNOLOGY TWO MARK QUESTIONS AND ANSWERS

MARTHANDAM COLLEGE OF ENGINEERING AND TECHNOLOGY DEPARTMENT OF INFORMATION TECHNOLOGY TWO MARK QUESTIONS AND ANSWERS MARTHANDAM COLLEGE OF ENGINEERING AND TECHNOLOGY DEPARTMENT OF INFORMATION TECHNOLOGY TWO MARK QUESTIONS AND ANSWERS SUB NAME: COMPUTER ORGANIZATION AND ARCHITECTTURE SUB CODE: CS 2253 YEAR/SEM:II/IV Marthandam

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Micro-programmed Control Ch 17

Micro-programmed Control Ch 17 Micro-programmed Control Ch 17 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics Course Summary 1 Hardwired Control (4) Complex Fast Difficult to design Difficult to

More information

Hardwired Control (4) Micro-programmed Control Ch 17. Micro-programmed Control (3) Machine Instructions vs. Micro-instructions

Hardwired Control (4) Micro-programmed Control Ch 17. Micro-programmed Control (3) Machine Instructions vs. Micro-instructions Micro-programmed Control Ch 17 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics Course Summary Hardwired Control (4) Complex Fast Difficult to design Difficult to modify

More information

SUPERSCALAR AND VLIW PROCESSORS

SUPERSCALAR AND VLIW PROCESSORS Datorarkitektur I Fö 10-1 Datorarkitektur I Fö 10-2 What is a Superscalar Architecture? SUPERSCALAR AND VLIW PROCESSORS A superscalar architecture is one in which several instructions can be initiated

More information

Materials: 1. Projectable Version of Diagrams 2. MIPS Simulation 3. Code for Lab 5 - part 1 to demonstrate using microprogramming

Materials: 1. Projectable Version of Diagrams 2. MIPS Simulation 3. Code for Lab 5 - part 1 to demonstrate using microprogramming CS311 Lecture: CPU Control: Hardwired control and Microprogrammed Control Last revised October 18, 2007 Objectives: 1. To explain the concept of a control word 2. To show how control words can be generated

More information

Pipeline Architecture RISC

Pipeline Architecture RISC Pipeline Architecture RISC Independent tasks with independent hardware serial No repetitions during the process pipelined Pipelined vs Serial Processing Instruction Machine Cycle Every instruction must

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture2 Pipelining: Basic and Intermediate Concepts Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each

More information

CAD for VLSI 2 Pro ject - Superscalar Processor Implementation

CAD for VLSI 2 Pro ject - Superscalar Processor Implementation CAD for VLSI 2 Pro ject - Superscalar Processor Implementation 1 Superscalar Processor Ob jective: The main objective is to implement a superscalar pipelined processor using Verilog HDL. This project may

More information

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017! Advanced Topics on Heterogeneous System Architectures Pipelining! Politecnico di Milano! Seminar Room @ DEIB! 30 November, 2017! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2 Outline!

More information

MC9211Computer Organization. Unit 4 Lesson 1 Processor Design

MC9211Computer Organization. Unit 4 Lesson 1 Processor Design MC92Computer Organization Unit 4 Lesson Processor Design Basic Processing Unit Connection Between the Processor and the Memory Memory MAR PC MDR R Control IR R Processo ALU R n- n general purpose registers

More information

Micro-programmed Control Ch 15

Micro-programmed Control Ch 15 Micro-programmed Control Ch 15 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics 1 Hardwired Control (4) Complex Fast Difficult to design Difficult to modify Lots of

More information

Machine Instructions vs. Micro-instructions. Micro-programmed Control Ch 15. Machine Instructions vs. Micro-instructions (2) Hardwired Control (4)

Machine Instructions vs. Micro-instructions. Micro-programmed Control Ch 15. Machine Instructions vs. Micro-instructions (2) Hardwired Control (4) Micro-programmed Control Ch 15 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics 1 Machine Instructions vs. Micro-instructions Memory execution unit CPU control memory

More information

structural RTL for mov ra, rb Answer:- (Page 164) Virtualians Social Network Prepared by: Irfan Khan

structural RTL for mov ra, rb Answer:- (Page 164) Virtualians Social Network  Prepared by: Irfan Khan Solved Subjective Midterm Papers For Preparation of Midterm Exam Two approaches for control unit. Answer:- (Page 150) Additionally, there are two different approaches to the control unit design; it can

More information

Microprogramming is a technique to implement the control system of a CPU using a control store to hold the microoperations.

Microprogramming is a technique to implement the control system of a CPU using a control store to hold the microoperations. CS 320 Ch. 21 Microprogrammed Control Microprogramming is a technique to implement the control system of a CPU using a control store to hold the microoperations. Microprogramming was invented by Maurice

More information

Micro-programmed Control Ch 15

Micro-programmed Control Ch 15 Micro-programmed Control Ch 15 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics 1 Hardwired Control (4) Complex Fast Difficult to design Difficult to modify Lots of

More information

Suggested Readings! Recap: Pipelining improves throughput! Processor comparison! Lecture 17" Short Pipelining Review! ! Readings!

Suggested Readings! Recap: Pipelining improves throughput! Processor comparison! Lecture 17 Short Pipelining Review! ! Readings! 1! 2! Suggested Readings!! Readings!! H&P: Chapter 4.5-4.7!! (Over the next 3-4 lectures)! Lecture 17" Short Pipelining Review! 3! Processor components! Multicore processors and programming! Recap: Pipelining

More information

Part A Questions 1. What is an ISP? ISP stands for Instruction Set Processor. This unit is simply called as processor which executes machine instruction and coordinates the activities of other units..

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 Lecture 3 Pipelining Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 A "Typical" RISC ISA 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take pair)

More information

C.1 Introduction. What Is Pipelining? C-2 Appendix C Pipelining: Basic and Intermediate Concepts

C.1 Introduction. What Is Pipelining? C-2 Appendix C Pipelining: Basic and Intermediate Concepts C-2 Appendix C Pipelining: Basic and Intermediate Concepts C.1 Introduction Many readers of this text will have covered the basics of pipelining in another text (such as our more basic text Computer Organization

More information

DLX Unpipelined Implementation

DLX Unpipelined Implementation LECTURE - 06 DLX Unpipelined Implementation Five cycles: IF, ID, EX, MEM, WB Branch and store instructions: 4 cycles only What is the CPI? F branch 0.12, F store 0.05 CPI0.1740.83550.174.83 Further reduction

More information

UNIT I BASIC STRUCTURE OF COMPUTERS Part A( 2Marks) 1. What is meant by the stored program concept? 2. What are the basic functional units of a

UNIT I BASIC STRUCTURE OF COMPUTERS Part A( 2Marks) 1. What is meant by the stored program concept? 2. What are the basic functional units of a UNIT I BASIC STRUCTURE OF COMPUTERS Part A( 2Marks) 1. What is meant by the stored program concept? 2. What are the basic functional units of a computer? 3. What is the use of buffer register? 4. Define

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

Module 5 - CPU Design

Module 5 - CPU Design Module 5 - CPU Design Lecture 1 - Introduction to CPU The operation or task that must perform by CPU is: Fetch Instruction: The CPU reads an instruction from memory. Interpret Instruction: The instruction

More information

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism

CS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism CS 252 Graduate Computer Architecture Lecture 4: Instruction-Level Parallelism Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://wwweecsberkeleyedu/~krste

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture 1 L E C T U R E 4: D A T A S T R E A M S I N S T R U C T I O N E X E C U T I O N I N S T R U C T I O N C O M P L E T I O N & R E T I R E M E N T D A T A F L O W & R E G I

More information

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards CISC 662 Graduate Computer Architecture Lecture 6 - Hazards Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Department of Computer Science and Engineering

Department of Computer Science and Engineering Department of Computer Science and Engineering UNIT-III PROCESSOR AND CONTROL UNIT PART A 1. Define MIPS. MIPS:One alternative to time as the metric is MIPS(Million Instruction Per Second) MIPS=Instruction

More information

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions.

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions. MIPS Pipe Line 2 Introduction Pipelining To complete an instruction a computer needs to perform a number of actions. These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously

More information

Structure of Computer Systems

Structure of Computer Systems 288 between this new matrix and the initial collision matrix M A, because the original forbidden latencies for functional unit A still have to be considered in later initiations. Figure 5.37. State diagram

More information

MaanavaN.Com CS1202 COMPUTER ARCHITECHTURE

MaanavaN.Com CS1202 COMPUTER ARCHITECHTURE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK SUB CODE / SUBJECT: CS1202/COMPUTER ARCHITECHTURE YEAR / SEM: II / III UNIT I BASIC STRUCTURE OF COMPUTER 1. What is meant by the stored program

More information

Appendix C: Pipelining: Basic and Intermediate Concepts

Appendix C: Pipelining: Basic and Intermediate Concepts Appendix C: Pipelining: Basic and Intermediate Concepts Key ideas and simple pipeline (Section C.1) Hazards (Sections C.2 and C.3) Structural hazards Data hazards Control hazards Exceptions (Section C.4)

More information

Hardware-based Speculation

Hardware-based Speculation Hardware-based Speculation Hardware-based Speculation To exploit instruction-level parallelism, maintaining control dependences becomes an increasing burden. For a processor executing multiple instructions

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

In embedded systems there is a trade off between performance and power consumption. Using ILP saves power and leads to DECREASING clock frequency.

In embedded systems there is a trade off between performance and power consumption. Using ILP saves power and leads to DECREASING clock frequency. Lesson 1 Course Notes Review of Computer Architecture Embedded Systems ideal: low power, low cost, high performance Overview of VLIW and ILP What is ILP? It can be seen in: Superscalar In Order Processors

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language. Architectures & instruction sets Computer architecture taxonomy. Assembly language. R_B_T_C_ 1. E E C E 2. I E U W 3. I S O O 4. E P O I von Neumann architecture Memory holds data and instructions. Central

More information

CPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts

CPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts CPE 110408443 Computer Architecture Appendix A: Pipelining: Basic and Intermediate Concepts Sa ed R. Abed [Computer Engineering Department, Hashemite University] Outline Basic concept of Pipelining The

More information

What is Superscalar? CSCI 4717 Computer Architecture. Why the drive toward Superscalar? What is Superscalar? (continued) In class exercise

What is Superscalar? CSCI 4717 Computer Architecture. Why the drive toward Superscalar? What is Superscalar? (continued) In class exercise CSCI 4717/5717 Computer Architecture Topic: Instruction Level Parallelism Reading: Stallings, Chapter 14 What is Superscalar? A machine designed to improve the performance of the execution of scalar instructions.

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

14:332:331 Pipelined Datapath

14:332:331 Pipelined Datapath 14:332:331 Pipelined Datapath I n s t r. O r d e r Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 Single Cycle Disadvantages & Advantages Uses the clock cycle inefficiently the clock cycle must be timed to accommodate

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

Pipelining. Each step does a small fraction of the job All steps ideally operate concurrently

Pipelining. Each step does a small fraction of the job All steps ideally operate concurrently Pipelining Computational assembly line Each step does a small fraction of the job All steps ideally operate concurrently A form of vertical concurrency Stage/segment - responsible for 1 step 1 machine

More information

Materials: 1. Projectable Version of Diagrams 2. MIPS Simulation 3. Code for Lab 5 - part 1 to demonstrate using microprogramming

Materials: 1. Projectable Version of Diagrams 2. MIPS Simulation 3. Code for Lab 5 - part 1 to demonstrate using microprogramming CPS311 Lecture: CPU Control: Hardwired control and Microprogrammed Control Last revised October 23, 2015 Objectives: 1. To explain the concept of a control word 2. To show how control words can be generated

More information

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Practice Problems (Con t) The ALU performs operation x and puts the result in the RR The ALU operand Register B is loaded with the contents of Rx

Practice Problems (Con t) The ALU performs operation x and puts the result in the RR The ALU operand Register B is loaded with the contents of Rx Microprogram Control Practice Problems (Con t) The following microinstructions are supported by each CW in the CS: RR ALU opx RA Rx RB Rx RB IR(adr) Rx RR Rx MDR MDR RR MDR Rx MAR IR(adr) MAR Rx PC IR(adr)

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Superscalar Processors Ch 14

Superscalar Processors Ch 14 Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion

More information

Superscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency?

Superscalar Processing (5) Superscalar Processors Ch 14. New dependency for superscalar case? (8) Output Dependency? Superscalar Processors Ch 14 Limitations, Hazards Instruction Issue Policy Register Renaming Branch Prediction PowerPC, Pentium 4 1 Superscalar Processing (5) Basic idea: more than one instruction completion

More information

Superscalar Machines. Characteristics of superscalar processors

Superscalar Machines. Characteristics of superscalar processors Superscalar Machines Increasing pipeline length eventually leads to diminishing returns longer pipelines take longer to re-fill data and control hazards lead to increased overheads, removing any performance

More information

UNIT- 5. Chapter 12 Processor Structure and Function

UNIT- 5. Chapter 12 Processor Structure and Function UNIT- 5 Chapter 12 Processor Structure and Function CPU Structure CPU must: Fetch instructions Interpret instructions Fetch data Process data Write data CPU With Systems Bus CPU Internal Structure Registers

More information

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University Lecture 9 Pipeline Hazards Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee18b 1 Announcements PA-1 is due today Electronic submission Lab2 is due on Tuesday 2/13 th Quiz1 grades will

More information

CS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming

CS 152 Computer Architecture and Engineering. Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming CS 152 Computer Architecture and Engineering Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming John Wawrzynek Electrical Engineering and Computer Sciences University of California at

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation 1 Branch Prediction Basic 2-bit predictor: For each branch: Predict taken or not

More information

omputer Design Concept adao Nakamura

omputer Design Concept adao Nakamura omputer Design Concept adao Nakamura akamura@archi.is.tohoku.ac.jp akamura@umunhum.stanford.edu 1 1 Pascal s Calculator Leibniz s Calculator Babbage s Calculator Von Neumann Computer Flynn s Classification

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Preventing Stalls: 1

Preventing Stalls: 1 Preventing Stalls: 1 2 PipeLine Pipeline efficiency Pipeline CPI = Ideal pipeline CPI + Structural Stalls + Data Hazard Stalls + Control Stalls Ideal pipeline CPI: best possible (1 as n ) Structural hazards:

More information

Computer Architectures. DLX ISA: Pipelined Implementation

Computer Architectures. DLX ISA: Pipelined Implementation Computer Architectures L ISA: Pipelined Implementation 1 The Pipelining Principle Pipelining is nowadays the main basic technique deployed to speed-up a CP. The key idea for pipelining is general, and

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE EE 4683/5683: COMPUTER ARCHITECTURE Lecture 4A: Instruction Level Parallelism - Static Scheduling Avinash Kodi, kodi@ohio.edu Agenda 2 Dependences RAW, WAR, WAW Static Scheduling Loop-carried Dependence

More information